Structured Data

Eric Schmidt, CEO, Google:

"Within search results, information tied to verified online profiles will be ranked higher than content without such verification, which will result in most users naturally clicking on the top (verified) results. The true cost of remaining anonymous, then, might be irrelevance."

As of late 2012, there are about a trillion static, indexed web pages, and about a billion web sites. If one considers dynamically-created web pages, the number of pages would rise by a very significant factor.

Computer Screen File Folder

The challenge for search engines is making sense of the information. Suppose a page contains the phrases: "John Jones", "Governor Smith", and "Littleton, CO" and an image called "old-town-building.jpg." What's the relationship? Does John Jones support the active candidacy of Governor Smith? Did historical Governor Smith, 150 years ago, found the city of Littleton? Is the building image where John Jones spoke supporting Governor Smith last night? Last week, did John Jones buy the building that historical figure Governor Smith built in 1873, and opened a restaurant there?

The interrelationship between elements determines the page's meaning. To determine if this page is about current politics, history, or a local business, a search engine must understand the relationships.

Structured Data helps disambiguate. Let's look at an example from the karlkelman.com site:

1 <div style="display:inline;" itemscope itemtype="http://schema.org/Place"
2
3 <h2>Karl Kelman Skiing <span itemprop="name"Turtle Creek Trail</span>:</h2>
4
5 <meta itemprop="containedIn" content="Loveland Basin Ski Area">

Using the Microdata language, the code above tells search engines that this section of the web page is about a place named "Turtle Creek Trail," which is found in a larger entity called Loveland Basin Ski Area.

The next section of Structured Code further disambiguates:

1 <div style="display:inline;" itemprop="geo" itemscope itemtype="http://schema.org/GeoCoordinates">
2 <meta itemprop="latitude" content="39.672858" />
3 <meta itemprop="longitude" content="104.906417" />
4 <meta itemprop="elevation" content="3473" />
5 </div>

There could be thousands of places named "Turtle Creek" in the world: But only one with those geographical coordinates (Data Courtesy of Google Earth).

I'm also able to use Structured Data to provide a description:

1 <p itemprop="description">Turtle Creek is an easy ski run at the Loveland Basin Ski Area...

Here's a graphical representation of the relationship data I've provided for search engines:

Structured Data Example - Turtle Creek

As of this writing (2012), Google, Bing, Yahoo! and some other search engines have agreed to use the Microdata langauge for Structured Data.

A search engine understand a webpage's content to generate useful traffic. My personal site's Turtle Creek page is about skiing at Loveland Basin. I don't want site visitors looking for information on the town of Turtle Creek, in New Brunswick, Canada. If a web page confuses a search engine, the search engine's probable response omission of the page from search results.

If Structured Data helps clarify the match between a search phrase (Ex: "Turtle Creek Ski Run Loveland Basin") and the content page (my example above), it's more likely to be listed. Some Google Search results are impacted by Structured Data today (Snippets for Most Search Engine Results Pages, Video Search Results, etc.), Facebook Send and Like Image and text can be controlled by Structured Data right now. The impact of Structured Data is likely to increase significantly.

Structured Data is probably the core of Internet evolution over the next few years: A knowledge of relationships and context will enable search engines to deliver far more relevant and actionable results. Like every Internet technology, Structured Data is evolving, but it's easier to tweak existing implementions than wait for the mythical "final version" to begin coding. Elements of Structured Data usage will evolve, but you'll keep up to speed by starting implementation now.

Google Site Search makes extensive use of Structured Data currently, and is believed by many to be a test bed for wider use of Structured Data in ordering Search Engine Results Pages.

Facebook uses a complimentary, but slightly different Structured Data language called Open Graph. I generally include a few lines of Open Graph code at the top of my pages, which helps structure the image and text used in the Facebook Send Button.

This Open Graph Structured Data Code:

1 <meta property="og:type" content="website" /> 2 <meta property="og:url" content="http://www.karlkelman.com/skipictures/loveland/lovebyrun/chairtwo/turtle-creek/turtle-creek" />
3 <meta property="og:title" content="Turtle Creek Ski Trail | Loveland Basin Ski Area | Colorado Ski Areas" />
4 <meta property="og:description" content="Turtle Creek Ski Run Pictures and Video at the Loveland Basin Ski Area. Turtle Creek is a Beginner Ski Trail at Loveland, Running From the Insection of North Turtle Creek and South Turtle Creek." />
5 <meta property="og:image" content="http://www.karlkelman.com/skipictures/loveland/lovebyrun/chairtwo/turtle-creek/turtle-creek-thumbnail-250x250.jpg" />

Creates this result for the Facebook Send Button:


Structured Data Example - Turtle Creek

The same Open Graph Structured Data controls the image and text displayed on the Users Facebook Page. I've used a slightly different page for the example here:

Structured Data Example - Turtle Creek

Computers can deal with far great volumes of information than humans, but humans are currently better at natural language, comprehending context, and understanding the relationship betweens things, ideas, people, etc. Computers don't understand, for example, the idea of sarcasm or humor at all, but they can remember 10,000 names and addresses instantly without a single mistake (or 10,000 mistakes, if you've got a bad PHP script...).

On my web page about skiing the Turtle Creek Ski Run, it would be very confusing and unpleasant for a human to read the geographical coordinates of the ski run, the exacting minor details of when the video was shot, detailed copyright information, etc. For a human, that's too much information. Humans, can, however, make relationship inferences and disambiguate easily. They might immediately know, once they see video and pictures of ski runs and skiers, that the "Loveland" referenced by the page is the Loveland Basin Ski Area, and not the town of Loveland, Colorado, or Loveland, Ohio. They might infer from the domain "karlkelman.com" that the author of the page is Karl Kelman.

MySQL icon

Since there's a township of Turtle Creek, Ohio, and a city of Loveland, Ohio, it could be easy for a bot to conclude the page is about Ohio. The Geocoordinates eliminate any potential confusion in this regard. Humans might conclude from the ski-oriented nature of the site that the page isn't about Ohio, but computers benefit from very specific instructions.

A computer benefits from Structured Data indentifying Turtle Creek as a Ski Run, contained within the Loveland Basin Ski Area, and with very specific geographical information. Humans might look at the pictures, and determine the page is about an Alpine setting, but bots have difficult time drawing that sort conclusion from photos. The Structured Data which informs the bot that the Place in question is 3,473 meters above sea level is much more helpful to a bot - and may help direct visitors to the page with searches like "high altitude skiing," but not for searches like "catfish fishing in Turtle Creek."

Question and Answer

Structured Data is essentially a responsive design technique that provides human visitors and machine visitors information in the unique way that's most helpful for both.

No. The human inferences are imprecise, and confusing ambiguities can arise in human-to-human communication. However, the occasional misunderstanding between people is preferable to the data overload that would occur if we clarified all communications by, say, listing ten facts about a person every time we mention their name. Computers natural language skills will improve, but data overload isn't a problem for them, so they'll have a net benefit from Structured Data for a long time.

Structured Data helps Our Digital Friends emulate the sort of intuitive leaps that we make as humans constantly - And, in doing so, improves search results.