Coffeehouse Thread

6 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Flash of genius in improving search engine results (?)

Back to Forum: Coffeehouse
  • User profile image
    androidi

    I read this on Codinghorror comments

    this may be overly simplistic but could the magic dial google needs to turn simply be to adjust how the date something is published adds to the ranking?

     http://www.codinghorror.com/blog/2011/01/trouble-in-the-house-of-google.html#comments

     

    I just got a sudden temptation to start scraping the web then years later put up a search engine based on this algorithm (+ couple of my own ideas). In my experience the oldest content on the web is the most worthwhile content. Spamming-companies domains come and go so they wouldn't have any chance to show up... anywhere but completely last in the results. Also if the content on the url changes a lot then that would rank it lower unless the site was recognized somehow as being "useful dynamic site / database type site".

    Then, if the spam result showed up at the first pages, just add a browser feature that if the user clicks the site but returns soon (back button -> send the same query again?) then assume the site wasn't what the query was about and rank it lower. Or if the user mouse hovers over the result but doesn't click that also lowers the ranking but not as much.

     

    This would also promote people to keep same static content up in exactly same address as long as possible (though maybe somekind of way to, against payment, to relocate the content and preserve the ranking would be added if one absolutely had to relocate some content to new domain or whatever).

     

    edit: how to distinguish the content ripper/scraper sites from the content publisher and deal with frequently updating new content:

    a) The publisher publishes the new/changed content to google first, even if just seconds before the scraper

    b) as the publisher url used for publishing to everyone has stayed around longer, it will be ranked higher.

    Both used in combination is used with "dynamic" sites.

  • User profile image
    PaoloM

    Actually, I would say that the newest content is the most relevant. I guess it depends on what kind of context you are talking about...

  • User profile image
    C9Matt

    I'm pretty sure the oldest page on any given news website is less interesting than the newest one.

    Also, if you have a great idea for search, I'd patent it and sell it to Microsoft or Google or Yahoo, rather than setting up your own search engine, because otherwise you're going to lose that battle.

  • User profile image
    brian.​shapiro

    @C9Matt:

    You could always add to such an algorithm so that if the two pages are on the same site, then it promotes the newer page over the older one; but if they're on different sites, and everything else is equal, it promotes the older page.

    If people were looking for something that was genuinely news, I dont see why they wouldnt do a news search limited to a recent time span, rather than a general search of the web.

    On tech searches, I think its ok to expect users to enter the version of their software. Otherwise, they can do a site search on a tech site to do better results. (Not every search has to be done in the same place)

    The biggest problem with this concept is maybe thinking how to deal with "search squatters" that would come up.

     

  • User profile image
    MasterPi

    I remember reading a paper by MSR in which they outline an algorithm that caters to shifts in intents for search keywords. Like, if you were searching for "eggs" during easter, the algorithm will build a new ranking based on recent queries (and other data such as pagerank). That way, old data that might not be relevant for a time period wont creep up.

    I'm not sure if this has actually been implemented into Bing, though.

  • User profile image
    brian.​shapiro

    @MasterPie:

    I don't like attempts to guess the users intent, its always screwed up when you get the users intent wrong. But, I do think search algorithms should be more based around how a human would analyse and organize the data if he were to be going through those billions of search results hand by hand, say if he were doing it on behalf of another person. You can figure out a lot that way I think.

    Also, personally, as someone used to softeware development, I'd also be extremely happy with some type of search query based on regex type pattern matching.

     

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.