One of the things we spend time on for Channel 9 is trying to make sure we are following the right standards, adopting new ones as necessary and that we render correctly for both real people and for search engines. As part of that ongoing work, we recently adopted two new concepts.
The first is the concept of a canonical URL, as described here by Matt Cutts (his blog is a must read for anyone building public facing web sites); a link element that specifies what our definitive single URL should be for any given post.
Why does this matter? Well, on many web sites (including ours) there is more than one URL that will get you to the same piece of content. Consider the latest 'This Week On Channel 9' episode, it is available at:
- and nearly any variation of those URLs plus any random query string that you want to stick on the end
Assuming Google/Bing/Yahoo only found the post from links on the Channel 9 home page, it would always see the first one... and everything would be great. That isn't how search engines work though, they care about inbound links from many different sources, and it is possible that many different URLs are out there in the wild that really represent the same single piece of content. Each additional URL beyond the first looks like duplicate content and takes away from the search engine love that should be given to the first result. The standard way to avoid this in the past was to redirect every person coming in on anything but the link we want. That works, but forcing your user to go through an extra browser round trip for some obscure technical reason is less than ideal. Enter the idea of the canonical url. Add this to your page and no matter how the search engine finds the page, it knows what URL to associate it with in the system.
If you check any of those links above and view source, you'll find the same thing on each and every one:
<link rel="canonical" href="https://channel9.msdn.com/shows/This+Week+On+Channel+9/This-Week-C9-Windows-7-RTMs-7-Sins-of-App-Compat--cool-Silverlight-apps/" />
A fair number of other sites implement this as well, check out Ars Technica for example, and I'm sure more will follow. Should you implement this on your site? Well, first think about how many duplicate URLs will work on your content (if you are running a site that supports both www.sitename.com and sitename.com, that's one ... then if you can optionally have a filename like default.php or index.html... then that is already two duplicates for every URL) and then think about whether or not your position in search engine rankings is important... odds are you should look at adding support for this link tag on your pages.
A lot has been written
about how services like TinyURL, is.gd and other URL shortening services are bad for the internet, and we completely agree. They remove meaning from the link you are about to click (including the source of the content, which is an important issue for trust
and security), and they are dependent on the reliability of some unknown third party that might just go away at some point in the future. One solution, that sites like C9, Amazon and others have decided to go with is to
implement their own URL shortening. Yes, much of the meaning is lost, but at least they have control over that URL namespace and can make sure it is always available and always points at the intended content. Now that we have such a service though, what's
to stop people from just taking our original URLs and using any one of the free URL shortening services? Well, nothing at the moment, but
a movement is underway to allow content owners to specify a pre-existing short URL if they have one. The hope is that once this concept catches on, then URL shortening service or client
applications (like twitter clients) will try to look up a site's short url before calling out to a 3rd party service to create one. We don't know if this will catch on, but we like the idea so we've gone ahead and added the appropriate link tag and populated
it with our special r.ch9.ms short url. Once again, if you view source on that TWOC9 episode from above, you'll find this:
<link rel="shorturl" href="http://ch9.ms/AAPV" />
More to come...
If you went and viewed the source of those pages, you would have seen a ton of other <link> tags, and many different <meta> tags as well. Each of those does serve a purpose, and I'll dig into the rest of them in upcoming posts.
Note: For those of you who have the immediate response of 'aghhh... so much wasted bandwidth for meta tags!', I know what you mean... and I also know that some sites choose to only render those tags out when a search crawler hits them. That practice is a bit sneaky though, in general you should avoid alterting the content you serve up to search engines... although it is probably only an issue if you change the actual content of the page in an attempt to deceive the search engine.