CNN is reporting that Anna Patterson, who sold her first web search technology to Google, is coming out publicly with a new, supposedly better search engine called Cuil. She claims it spans a much larger portion of the web than Google, and therefore gives better results. Read the full article on CNN here.
In Google’s defense, I don’t think that raw search index size necessarily correlates with a better search experience. Yes, having a complete index is good, but being able to parse out the index, decide what is relevant and valuable to the users, and ultimately deliver the best, most useful search results is what it’s all about. If Cuil returns a ton of random pages to me that yes, may not exist in Google, but are not as useful, it has no value to me. The golden balance is optimizing the index size along side content analysis and parsing, to reduce the number of pages in the searchable index to only those that are valuable to the user.
“It’s not the size of your index, it’s how you use it.”
This morning, in response to Cuil, Search Engine Land had a series of good thoughts about exactly this – Google “Knows” About 1 Trillion Web Items discusses the size of Google’s index, and their full Cuil post “Cuil Launches — Can This Search Start-Up Really Best Google?” thoroughly tests out this “Google competitor”.
In my quick test of Cuil, I didn’t find what I was looking for. “Cueling” my own name (with safe search on) returned nothing, whereas when Googling my name (with safe search turned on, at the default setting) returned 698 results. However, Cuiling my name with safe search turned off returned 25,110 results, whereas Googling with safe search turned off returned the same 698 restults. Honestly, although Cuil returned more restults with safe search off, I’d still trust google more. To my knowlege, there’s no content out there about me on the web that I feel should be restricted by a safe search filter – yet Cuil deemed all of it as “not safe”, and only showed it when safe search was turned off. This is exactly what I’m talking about – It’s not necessarily all aboutthe amount of raw data in an index, but more importantly, how a search engine parses out and organizes that data.