Metadata on the Web
10 comments | Posted: 21 January 06 in Tutorials, by Yannick Lyn Fatt
Over the past few days I have been wondering what are the different types of metadata one can add to their website. I’ve seen a number of different types, some of which are familiar to me and others which aren’t. But just what exactly is metadata? The W3C defines it as follows:
Information about a document rather than document content.
As it says meta data helps to give information about the document and not the content of the document. This information can be used by search engines when indexing your page as well as for other purposes as we will see later on.
Now lets take a look at the basic formation of the meta tag. There are two basic forms and they are as follows:
<meta name=" " content=" " />
<meta http-equiv=" " content=" " />
nameattribute identifies the property and the
contentattribute specifies the property’s value… The
http-equivattribute may be used in place of the
nameattribute. HTTP servers use this attribute to gather information for HTTP response message headers.
There is also another attribute called
scheme. This attribute allows authors to provide user agents more context for the correct interpretation of metadata. Here is an example:
<meta scheme="ISBN" name="identifier" content="0-8230-2355-9" />
<meta> tag always appears between the
Now that’s pretty simple enough to understand but what sorts of values actually go in those attributes? Let’s first take a look at some properties for the
name attribute that should be familiar.
author basically specifies who the author of the web page is. Here’s an example:
<meta name="author" content="John Doe" />
description provides a summary of what the web page is about. This is commonly used by search engines to provide the summary that appears under the link in search results. If a description is not provided then the search engine will use text from your page to fill in the summary under the link.
<meta name="description" content="Here is a summary of this page on my website" />
Next is the property
keywords. This provides some common words that search engines will use to improve the quality of search results. Chances are if someone types in one of those words into a search engine your site will be in the results. Words should be chosen that match or are related to the content on your site.
<meta name="keywords" content="christian, css, design, development, interface, standards, user, web, xhtml" />
Now lets take a look at some properties for the
http-equiv attribute. By now you should be familiar with this first example, since Nathan wrote about it recently. That property is
content-type. I won’t go into any details about it but will just show an example.
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
content-language property specifies what language the content of the document is written in. Some search engines will use this to categorize documents by language.
<meta http-equiv="content-language" content="en-us" />
refresh property allows your page to be refreshed every few seconds. Some search engines might not like this as they may consider it spam, so it isn’t advisable that you use this unless it is necessary. Here is an example:
<meta http-equiv="refresh" content="30" />
A page with the above meta tag will refresh every 30 seconds.
Ok so by now you are probably saying, “I know all of that already, tell me something I don’t know.” Well lets get started on some of those you might not be so familiar with.
The not so familiar…
There is the
generator property that basically tells what program was used to create the web page.
<meta name=”generator” content="Microsoft FrontPage 4.0" />
(Yes, I know, FrontPage, how icky.)
robots property is used by search engines to determine whether or not it should index the page or follow the links on the page.
<meta name="robots" content="index,nofollow" />
index value tells search engines that they are allowed to index the page, while
noindex tells them not to. And the
follow value tells them to follow the links on the page, while
nofollow does the opposite. Keep in mind if you are using more than one, then they should be separated by commas (as shown in the example above). There are also two more values,
none. The all value is equivalent to
index,follow while the none value is equivalent to
I just saw this one for the first time the other day and decided to check it out. The
pragma property tells the browser whether or not to cache your web page locally. Everytime you visit a website, it keeps a copy of it in the cache, a place on your computer where your browser can call that page should you want to visit the site again. If the page has not changed since the last time you visited it, it will just call the page from the cache instead of going to the webserver. This helps to bring up the page a little faster.
<meta http-equiv="pragma" content="no-cache" />
cache was used instead of
no-cache then of course, it would cache your pages, but that is usually done by default so you wouldn’t need to use the
pragma property in that case.
Then we have
pics-label. Pics who? PICS stands for Platform for Internet Content Selection and this is used to allow content filters to better identify what rating the content of your site is. Also according to the W3C it also facilitates other uses for labels, including code signing and privacy. An example is as follows:
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))' />
Microsoft Specific meta properties
When Windows XP came on the scene it added a few enhancements to Internet Explorer. Things like an image toolbar when you hover over images, smart tags that are generated by other Microsoft products and theming for certain elements on a web page so that it looks and feels just like the Windows XP interface. Now we will take a look at some of the metadata that you can use to turn these off, if you so desire.
imagetoolbar is not a very common property for the name attribute. This property is directed towards Internet Explorer. When this is used it turns off IE(Internet Explorer) ’s image toolbar which appears when you hover over an image in IE (as shown below).
Here is how you would disable that feature:
<meta http-equiv="imagetoolbar" content="no" />
The next one we will look at is the
<meta name="MSSmartTagsPreventParsing" content="true" />
This property prevents Microsoft products from automatically generating “smart tags.” Internet Explorer in Windows XP allows elements such as buttons and the scrollbar to have a look and feel similar to the theme used in XP. Sometimes though we may not want that look by default and so we use the following meta tag to do so:
<meta http-equiv="msthemecompatible" content="no" />
I recently found out about Geo Tags and here is a definition of what it is according to wikipedia:
GeoTagging, sometimes referred to as Geocoding, is the process of adding geographical identification metadata to various media such as websites, RSS feeds or images. This data usually consists of latitude and longitude coordinates, though it can also include altitude and placenames.
Here is an example of it in use:
<meta name="geo.position" content="49.2;-123.4" />
<meta name="geo.placename" content="london, ontario" />
<meta name="geo.region" content="ca-on" />
geo.position the first value in the content attribute is the latitude, while the second value is the longitude. In
geo.placename the value in the content attribute represents the place name and in
ca-on represents the country subdivision code.
The Dublin Core Metadata Initiative
It is important to note that while there are some commonly used meta tags, there aren’t any real standards. As long as there is something to understand the properties, you’re good to go. The Dublin Core Metadata Initiative (DMCI) hopes to change that though. According to their website the DMCI “provides simple standards to facilitate the finding, sharing and management of information”. There are a few websites using their standards, one of which is Max Design. Below I will show you an example of it in action:
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.title" content="Page Title" />
<meta name="DC.creator" content="John Doe" />
<meta name="DC.subject" content="God; christianity; christ; Jesus; porch; Bible; the word" />
<meta name="DC.description" content="Come have a seat on God's Porch" />
<meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
<meta name="DC.format" content="text/html; charset=utf-8" />
<meta name="DC.format" content="8137 bytes" />
<meta name="DC.identifier" scheme="DCTERMS.URI" content="http://www.example.net" />
That was a lot to swallow in one go, I know, but as you look through it you will notice a similarity between the DCMI metadata and some of the others I have mentioned in this article.
As you have seen there are a variety of metadata out there and this article only covered a few. While most of these are not required, I would recommend that your websites at least use the
References and Further Reading
- Smart Tags
- What are Meta Tags
- Image Toolbar
- Meta – Meta Data
- W3 Schools – The Meta Element
- W3 Schools – The Meta Tag
- W3C – META
- Geo Tags
- About the Dublin Core Metadata Initiative
- DCMI Metadata Terms
Discuss This Topic
Comments closed after 2 weeks.