OK, after being initially skeptical, I’ve become convinced that Yahoo! SearchMonkey has the potential to really change the game in search.  The evidence is mounting that they have really thought this through, and that they intend to disrupt the existing order.  The plan is somewhat crazy, but this just might work.

When SearchMonkey launched, about 6 weeks ago, it seemed that the news was primarily lauded by proponents of RDF who believed Yahoo!’s endorsement of RDF would resurrect their beloved but anemic Semantic Web (with a big “S”) standard and give it legs to finally dethrone the small-“s” semantic web technologies like tags and microformats.  To understand why they were so excited, you need to understand that it is the search engines who strangled RDF in the first place.

7 years ago (when Google was still a serious underdog), I argued  that the search engines completely control the fate of “semantic web” standards, and explained that the major search engines have very little business incentive to support such standards.  You can read the whole whitepaper, but the summary is simple:

1)      Search engines are the dominant way that people query for entities on the web, and it’s nearly impossible to get authors to add the semantics and bootstrap the semantic web if search engines ignore the semantics or promote competing semantics.

2)      When it is difficult to extract semantics from documents, it gives advantage to incumbents with massive scale data centers who can extract semantics from natural language.  It creates barriers to entry for new competitors.

3)      As a top search engine, you want the most useful semantic information stored in a format that your competitors cannot utilize.

For 7 years, my thesis held.  At the time, I lobbied both Google and Microsoft to start indexing RDF (and later microformats).  My hope was that their desire to disrupt the (then) dominant Yahoo! search position would lead to a more open web.  But for 7 years, no search engine was crazy enough to truly adopt open standards for semantics.  In fact, Google even dropped support for meta tag’s rudimentary semantics during that time period.

Then came SearchMonkey.  For the reasons outlined above, indexing RDF and microformats is a pretty crazy underdog disruptive strategy, so I was skeptical.  At first, my skepticism seemed to be justified:

1)      At first, they supported only a handful of partners.  See point #3 above.

2)      The functionality was totally opt-in by consumers, and Yahoo! was doing nothing to evangelize it to average users.  It looked like a silly PR stunt to curry favor with the RDF and microformats camps, and clearly Yahoo! was not putting any wood behind it.

3)      Semantics can only be added by document owners, on their own subdomain.  This immediately favors large incumbents.  See the whitepaper for a description of why author-created metadata is a very weak form of semantics.

In the past 2 weeks, however, the first two reasons for my initial skepticism have been obliterated.  The SearchMonkey gallery has expanded, and there are a number of interesting services already available.  It appears that Yahoo! is promoting services which are not necessarily created by the site authors, which is huge.  Check out the Wikipedia Topics entry, for example.  And the PHP API entry is a perfect example of why opt-in by default was a good choice – I may want my search results to show PHP API entries, but most people do not.  In addition, Yahoo! has started to promote the gallery from the home page of, under the customize button.

This isn’t a PR stunt.  These guys are serious.  Yahoo! took the single thing that drives publisher behavior (search engine exposure) and tied it squarely to open semantic standards.

Now, let’s contrast this with the Google approach.  Google were the very first to offer “blended” search results, and much was made of the fact that Google Maps returns microformats on search results page.  But spitting up microformats from your proprietary index is the opposite of consuming microformats to enrich your index.  And the mechanism by which Google attaches semantics to the “plus box” is notoriously opaque.  Watching people beg Matt Cutts for information, insinuate that blended results on SERP amounts to paid placement, or speculate about the algorithm as it changes under their feet (did Google “plus box” really just start scraping hCard?) makes you appreciate the way that Yahoo! does it out in the open.

Google SERP grabbed the hCard?

Google did pay lip service to “out in the open”, when they launched Google Base to much fanfare and started integrating Google Base results into the main search results page.  But Google Base still required publishers to store their content in Google’s servers, and the prominent listing on the search results page quickly became a distant memory and Google Base a black hole with little influence on the main search page.

I think people were a bit confused when Yahoo! claimed that SearchMonkey is a “long tail” strategy.  But the discussion of Google’s contrasts should have made it clear by now that they are right.  Yahoo!’s model of user opt-in makes room for both the default mass-appeal plugins (like Flickr) and the more niche plugins like PHP APIs.

Overall, this is very strong progress in just 6 weeks.  To keep up the momentum, Yahoo! needs to continue promoting to end-users, and should be more aggressive about influencing search results ordering when SearchMonkey plugins are installed.  For example, I have opted-in to the Yelp plugin, but perfectly good Yelp tresults often get pushed off of the page by CitySearch and others.  Random samplings of users who haven’t tried any customizations should be shown enhanced search results pages and offered the chance to customize.

In addition, Yahoo! should allow SearchMonkey plugins to customize results for other pages.  For example, I should be able to see the IMDB information next to a search result for a blog page that reviews a movie.  This would truly bootstrap the use of microformats, since adding a microformat to your page would automatically make it more useful to anyone using Yahoo!’s search engine.  Google tried something similar, with less than stellar results, when they started using scraped addresses from around the web to enhance their map “plus box”.  When Google scraped restaurant addresses from old and outdated sites, the search results page “plus box” started directing diners to the wrong location, leaving restaurant owners bewildered as they tried to figure out where the wrong data was coming from.  Yahoo!’s approach mitigates against this, since people opt-in to the provider, and they know where the data is coming from.  In Google’s approach, you get whatever plugins Google gives you, and you have no idea where they are getting the data.

If Yahoo! share stabilizes or increases, I would expect Google to respond by being more aggressive with their “plus box”, and perhaps embracing and extending, with an eye to extinguishing SearchMonkey.  SearchMonkey will encourage the greatest proliferation of microformats yet seen on the Internet, and as more microformats are available, Google will certainly start to leverage this information more in building their index.



