Metadata on the Web

10 comments | Posted: 21 January 06 in Tutorials, by Yannick Lyn Fatt

Over the past few days I have been wondering what are the different types of metadata one can add to their website. I’ve seen a number of different types, some of which are familiar to me and others which aren’t. But just what exactly is metadata? The W3C defines it as follows:

Information about a document rather than document content.

As it says meta data helps to give information about the document and not the content of the document. This information can be used by search engines when indexing your page as well as for other purposes as we will see later on.

The Basics

Now lets take a look at the basic formation of the meta tag. There are two basic forms and they are as follows:

<meta name=" " content=" " />

and

<meta http-equiv=" " content=" " />

The name attribute identifies the property and the content attribute specifies the property’s value… The http-equiv attribute may be used in place of the name attribute. HTTP servers use this attribute to gather information for HTTP response message headers.

There is also another attribute called scheme. This attribute allows authors to provide user agents more context for the correct interpretation of metadata. Here is an example:

<meta scheme="ISBN" name="identifier" content="0-8230-2355-9" />

Note: The <meta> tag always appears between the <head> tags.

Now that’s pretty simple enough to understand but what sorts of values actually go in those attributes? Let’s first take a look at some properties for the name attribute that should be familiar.

The property author basically specifies who the author of the web page is. Here’s an example:

<meta name="author" content="John Doe" />

The property description provides a summary of what the web page is about. This is commonly used by search engines to provide the summary that appears under the link in search results. If a description is not provided then the search engine will use text from your page to fill in the summary under the link.

<meta name="description" content="Here is a summary of this page on my website" />

Next is the property keywords. This provides some common words that search engines will use to improve the quality of search results. Chances are if someone types in one of those words into a search engine your site will be in the results. Words should be chosen that match or are related to the content on your site.

<meta name="keywords" content="christian, css, design, development, interface, standards, user, web, xhtml" />

Now lets take a look at some properties for the http-equiv attribute. By now you should be familiar with this first example, since Nathan wrote about it recently. That property is content-type. I won’t go into any details about it but will just show an example.

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Next, the content-language property specifies what language the content of the document is written in. Some search engines will use this to categorize documents by language.

<meta http-equiv="content-language" content="en-us" />

The refresh property allows your page to be refreshed every few seconds. Some search engines might not like this as they may consider it spam, so it isn’t advisable that you use this unless it is necessary. Here is an example:

<meta http-equiv="refresh" content="30" />

A page with the above meta tag will refresh every 30 seconds.

Ok so by now you are probably saying, “I know all of that already, tell me something I don’t know.” Well lets get started on some of those you might not be so familiar with.

The not so familiar…

There is the generator property that basically tells what program was used to create the web page.

<meta name=”generator” content="Microsoft FrontPage 4.0" />

(Yes, I know, FrontPage, how icky.)

The robots property is used by search engines to determine whether or not it should index the page or follow the links on the page.

<meta name="robots" content="index,nofollow" />

The index value tells search engines that they are allowed to index the page, while noindex tells them not to. And the follow value tells them to follow the links on the page, while nofollow does the opposite. Keep in mind if you are using more than one, then they should be separated by commas (as shown in the example above). There are also two more values, all and none. The all value is equivalent to index,follow while the none value is equivalent to noindex,nofollow.

I just saw this one for the first time the other day and decided to check it out. The pragma property tells the browser whether or not to cache your web page locally. Everytime you visit a website, it keeps a copy of it in the cache, a place on your computer where your browser can call that page should you want to visit the site again. If the page has not changed since the last time you visited it, it will just call the page from the cache instead of going to the webserver. This helps to bring up the page a little faster.

<meta http-equiv="pragma" content="no-cache" />

If cache was used instead of no-cache then of course, it would cache your pages, but that is usually done by default so you wouldn’t need to use the pragma property in that case.

Then we have pics-label. Pics who? PICS stands for Platform for Internet Content Selection and this is used to allow content filters to better identify what rating the content of your site is. Also according to the W3C it also facilitates other uses for labels, including code signing and privacy. An example is as follows:

<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))' />

Microsoft Specific meta properties

When Windows XP came on the scene it added a few enhancements to Internet Explorer. Things like an image toolbar when you hover over images, smart tags that are generated by other Microsoft products and theming for certain elements on a web page so that it looks and feels just like the Windows XP interface. Now we will take a look at some of the metadata that you can use to turn these off, if you so desire.

The property imagetoolbar is not a very common property for the name attribute. This property is directed towards Internet Explorer. When this is used it turns off IE(Internet Explorer) ’s image toolbar which appears when you hover over an image in IE (as shown below).

MS Image Toolbar

Here is how you would disable that feature:

<meta http-equiv="imagetoolbar" content="no" />

The next one we will look at is the MSSmartTagsPreventParsing.

<meta name="MSSmartTagsPreventParsing" content="true" />

This property prevents Microsoft products from automatically generating “smart tags.” Internet Explorer in Windows XP allows elements such as buttons and the scrollbar to have a look and feel similar to the theme used in XP. Sometimes though we may not want that look by default and so we use the following meta tag to do so:

<meta http-equiv="msthemecompatible" content="no" />

Geo Tags

I recently found out about Geo Tags and here is a definition of what it is according to wikipedia:

GeoTagging, sometimes referred to as Geocoding, is the process of adding geographical identification metadata to various media such as websites, RSS feeds or images. This data usually consists of latitude and longitude coordinates, though it can also include altitude and placenames.

Here is an example of it in use:

<meta name="geo.position" content="49.2;-123.4" />
<meta name="geo.placename" content="london, ontario" />
<meta name="geo.region" content="ca-on" />

For geo.position the first value in the content attribute is the latitude, while the second value is the longitude. In geo.placename the value in the content attribute represents the place name and in geo.region, ca-on represents the country subdivision code.

The Dublin Core Metadata Initiative

It is important to note that while there are some commonly used meta tags, there aren’t any real standards. As long as there is something to understand the properties, you’re good to go. The Dublin Core Metadata Initiative (DMCI) hopes to change that though. According to their website the DMCI “provides simple standards to facilitate the finding, sharing and management of information”. There are a few websites using their standards, one of which is Max Design. Below I will show you an example of it in action:

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.title" content="Page Title" />
<meta name="DC.creator" content="John Doe" />
<meta name="DC.subject" content="God; christianity; christ; Jesus; porch; Bible; the word" />
<meta name="DC.description" content="Come have a seat on God's Porch" />
<meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
<meta name="DC.format" content="text/html; charset=utf-8" />
<meta name="DC.format" content="8137 bytes" />
<meta name="DC.identifier" scheme="DCTERMS.URI" content="http://www.example.net" />

That was a lot to swallow in one go, I know, but as you look through it you will notice a similarity between the DCMI metadata and some of the others I have mentioned in this article.

Summary

As you have seen there are a variety of metadata out there and this article only covered a few. While most of these are not required, I would recommend that your websites at least use the content-type, keywords and description properties.

References and Further Reading

Discuss This Topic

  1. 1 Kevin

    Great, great article. I was always vaguely aware of the “robots” meta tag, but never did any research. I appreciate your explanation and just added the tag to a site I’m currently developing that is heavily optimized for spiders. Keep up the great work.

     
  2. 2 Yannick

    Thank you Kevin. I am happy it was helpful. I know I learnt quite a bit while researching and writing it up.

    You also keep up the good work.

     
  3. 3 Antti Mäki

    Nicely summarised. The closing slashes of some of the tags seem to be missing.

     
  4. 4 Nathan Smith

    Antti: Good eye! I went back through and made some quick edits to the article. It should be all patched up now. Thanks for pointing that out.

     
  5. 5 Yannick

    Thank you Antti. Much appreciated.

     
  6. 6 Robert

    Nice write up Yannick! Definately a good resource.

     
  7. 7 Peter Crackenberg

    < meta name="robots" content="index,nofollow" / >

    Isn’t this a little redundent to have on the page itself? Most web robots out there already use the robots.txt file and I’d assume the ones that don’t probably wouldn’t respect a meta tag instruction either. ;)

    Anyway, good write up Yannick, I had no clue that meta tags had so many possible uses.

     
  8. 8 Yannick

    Hello Peter,

    Thanks. There are a whole lot more out there. I don’t think this article even covered half of them, but I figured some of the ones mentioned are more common.

    As for what you mentioned about the robots.txt. Yes I would assume that would be true that robots would use the robots.txt file, however I don’t think everyone has that file on their servers (or might want to use it), so in that case the meta tag would probably come in handy. I guess it’s up to the developer which one he/she prefers.

    It seems the good thing about the robots.txt file is that one file can be used to specify quite a number or rules sitewide, whereas the meta tag would have to be put on each page that you want robots not to index or follow.

    By the way thanks for that link to the robots.txt information. That could come in handy in the future.

     
  9. 9 Jason Beaird

    It’s good to read up on and put a robots.txt file in your root since search engines are constantly looking for it. It’s super easy, and as you said, it’s better to define some sitewide rules than defining them on every page.

    Great list Yannick! There may be other meta data tags out there, but that covers all the ones I’ve ever used and a few that I didn’t know about. As a geocacher in my spare time, I particularly like the idea of being able to link a site to a physical location with geo.position (aka icbm) meta data.

     
  10. 10 Mark Priestap

    Outstanding article Yannick. I am starting to see the cobwebs cleared out of my eyes when I look at meta tags. This is definately Del.icio.us.

    @Jason: I concur on the robots.txt file. I recently started using that and it’s a lot cleaner and easier than adding extra tags at the top of each page.

     

Comments closed after 2 weeks.