as far as recall goes: Cuil searches more pages on the Web than anyone else—three times as many as Google and ten times as many as Microsoft;
Just a minor correction: Larger index != Higher Recall. Recall is defined as the proportion of documents returned by the search engine, that are also relevant. Since none of these search engines (Google included) return more than 1000 documents as the result to your query, the size of the index is not the determining factor when assessing recall. The proportion of those 1000 that are also relevant is the key.
So is Cuil claiming to have higher recall? Or just a larger index?
Yes, which? Yes to higher recall? Or yes to larger index? Because the two are not the same. “Searching more pages” is not a valid measure of recall. Especially, again, since it doesn’t matter if a search engine says that there are 20,000 or 500,000,000 hits. Because no matter how many hits the search engine says it has, it only “returns” the top 1,000. Ceteris paribus, a search engine that says it has 20,000 hits is actually equivalent to one that says it has 500,000,000 hits, in terms of recall. Because recall is defined as %relevant@maxReturned. And maxReturned is, in all search engines of which I am aware, equal to 1,000.
The last I checked, search engines will not even submit to being evaluated side-by-side in terms of precision. Much less recall. And if you can’t do precision, you can’t do recall, since both require an agreement on standard test queries and standard relevance judgments in order to do the comparison.
So how is it that Cuil is claiming to have 3 times greater recall than Google? Where are they getting these measurements from?
Surely, what Cuil means to tell us is that they are offering better recall. (Because just having a bigger index does not bring any benefit to the users.) However, as you point out, it would be hard to measure recall per se since for a given query, it is hard to tell what are the relevant documents.
However, surely, we can measure relative recall in a sensible fashion.
Oh! And I do not agree that all search engines return 1000 documents at most. This may be what most people will look at, but I think Google can return more than 1000 documents for a given query. I admit I did not check,but I doubt that they have an arbitrary threshold like that.
No. I knew what precision and recall were. My post is misleading and I misused the term “recall”. But after all these comments, fixing the post would be a shame. 😉
jeremysays:
Well. Recall is not defined as you do define it, but that’s a technical issue.
Well, Recall *is* defined as I define it. From exactly the link that you give. Recall is the intersection of *relevant* documents with *retrieved* documents. That’s exactly what I’ve been saying. If you read what I wrote above, I said it was % relevant @ maxReturned. The denominator of the formula you quote is the size of the relevant document set, which gives you the % (percentage) that I mention. And MaxReturned is the exact same thing as { retrieved }, the size of the retrieved set. Retrieved = returned to the user. Shown to the user. Given to the user. Whatever you want to call it. MaxReturned is simply the set of documents that the user has access to, from the search engine.
So how is what I said different from the (correct) recall formula link that you gave? It’s exactly what I just said.
And yes, go to Google right now and type in a query that you think should return more than 1000 hits. Then scroll to the bottom of the page, and click the 10th page of results. Then scroll to the bottom of that page, and click the 19th page of results. Then the 28th page, etc. See how far you get. If you get more than 1000, you should be able to get to the 101st page of results. I’ll give you 5:1 odds that you won’t be able to. Google does have that arbitrary threshold. Try it for a few queries, even. It shouldn’t take you more than a minute or two to really check.
Anyway, I’ll post another constructive (rather than defensive) comment in an hour or two, about recall. But I just had to note that, as you also found in that link defining recall, that recall was not defined by the size of the retrieved set alone (MaxReturned). It is defined as the proportion or percentage of relevant documents, in that set. Which is all I have been saying.
jeremysays:
Sorry, Daniel, I guess I’ve misunderstood you. I always thought that you were equating recall with “returned count” (“as far as recall goes: Cuil searches more pages on the Web than anyone else“). Text is sometimes a difficult medium for understanding.
So it seems to me that what we’d want to measure, instead of “recall”, is “coverage”. And maybe “coverage” could be better approximated by the search engine’s statement of the number of “hits” (results 1-10 of about 520,000).
But even there, despite the cutoff=1000 limitation, we have to be able to know how those hits were arrived at. For example, does the search engine do stemming? I.e. when you type the query “cooks”, does the search engine also match pages in the index with the work “cook”? How about “cooking”? If two search engines have the exact same size and content in their index, the exact same coverage of the web, and one engine does stemming and the other does not, then the engine that does stemming is going to show a larger number of “hits” than the other engine. Even though their indexes are equivalent!
We face a similar issue if the search engine does automatic query expansion based on pseudo-relevance feedback. Or if the search engine does automatic query expansion based on latent semantic analysis. In those cases, the engine might automatically add to your “cooks” query the terms “chef”, “food”, “culinary”, etc. When that happens, you will also naturally expand the number of “hits” that the search engine says is available. (Again, with the caveat that the engine still lies and only gives you the top 1000.)
So to answer the question of what engine has better coverage, we have to know how each engine is doing the matching between query and index. Is there stemming? Latent semantic analysis? Pseudo-relevance feedback? Etc?
Klaussays:
Cuil does not make use of any user information (such as clickstreams, user profiles, etc.) — at least according to what I’ve read. How could they without an existing user base?
This information, though, is decisive these days to make a competitive search engine and probably the biggest asset that Google, Yahoo, and Microsoft sit on. There is also recent evidence from the research community that it helps a lot (cf. the paper BrowseRank: Letting Web Users Vote for Page Importance from this year’s SIGIR).
Thus, Cuil might remain one of the (last) attempts to enter the search-engine race from scratch, burning piles of venture capital.
I am afraid that race is over. (I’d be happy to be proven wrong, though ;o))
jeremysays:
Well, I agree — all the fun is in the discussion. So I guess it’s good that we misunderstood each other 😉
Apparently Cuil was not quite ready for launch during the first day or two – many medium long tail queries did not return results at all, and even general queries returned way fewer results than they should have considering Cuil’s claims of having indexed so many pages already. They did improve somewhat afterward, however, and seem to be picking up more results and increasing relevance as more people have been testing out the engine.
In the long run, I hope they get things together and perform well enough to compete with the major search engines and then maybe do some advertising. I would like to see more serious competitors to Google in order to hold their power in check and encourage more transparency overall.
as far as recall goes: Cuil searches more pages on the Web than anyone else—three times as many as Google and ten times as many as Microsoft;
Just a minor correction: Larger index != Higher Recall. Recall is defined as the proportion of documents returned by the search engine, that are also relevant. Since none of these search engines (Google included) return more than 1000 documents as the result to your query, the size of the index is not the determining factor when assessing recall. The proportion of those 1000 that are also relevant is the key.
So is Cuil claiming to have higher recall? Or just a larger index?
Yes. Jeremy.
Yes, which? Yes to higher recall? Or yes to larger index? Because the two are not the same. “Searching more pages” is not a valid measure of recall. Especially, again, since it doesn’t matter if a search engine says that there are 20,000 or 500,000,000 hits. Because no matter how many hits the search engine says it has, it only “returns” the top 1,000. Ceteris paribus, a search engine that says it has 20,000 hits is actually equivalent to one that says it has 500,000,000 hits, in terms of recall. Because recall is defined as %relevant@maxReturned. And maxReturned is, in all search engines of which I am aware, equal to 1,000.
The last I checked, search engines will not even submit to being evaluated side-by-side in terms of precision. Much less recall. And if you can’t do precision, you can’t do recall, since both require an agreement on standard test queries and standard relevance judgments in order to do the comparison.
So how is it that Cuil is claiming to have 3 times greater recall than Google? Where are they getting these measurements from?
Jeremy: “Yes” you are right. Or do you want me to argue? 😉
Well. Recall is not defined as you do define it, but that’s a technical issue.
Surely, what Cuil means to tell us is that they are offering better recall. (Because just having a bigger index does not bring any benefit to the users.) However, as you point out, it would be hard to measure recall per se since for a given query, it is hard to tell what are the relevant documents.
However, surely, we can measure relative recall in a sensible fashion.
Oh! And I do not agree that all search engines return 1000 documents at most. This may be what most people will look at, but I think Google can return more than 1000 documents for a given query. I admit I did not check,but I doubt that they have an arbitrary threshold like that.
Jeremy? I agree with you since the beginning.
BTW, you are right, search engines appear to limit the result set. Interesting, I never noticed that before.
Again, please, I am not arguing against you…
No. I knew what precision and recall were. My post is misleading and I misused the term “recall”. But after all these comments, fixing the post would be a shame. 😉
Well. Recall is not defined as you do define it, but that’s a technical issue.
Well, Recall *is* defined as I define it. From exactly the link that you give. Recall is the intersection of *relevant* documents with *retrieved* documents. That’s exactly what I’ve been saying. If you read what I wrote above, I said it was % relevant @ maxReturned. The denominator of the formula you quote is the size of the relevant document set, which gives you the % (percentage) that I mention. And MaxReturned is the exact same thing as { retrieved }, the size of the retrieved set. Retrieved = returned to the user. Shown to the user. Given to the user. Whatever you want to call it. MaxReturned is simply the set of documents that the user has access to, from the search engine.
So how is what I said different from the (correct) recall formula link that you gave? It’s exactly what I just said.
And yes, go to Google right now and type in a query that you think should return more than 1000 hits. Then scroll to the bottom of the page, and click the 10th page of results. Then scroll to the bottom of that page, and click the 19th page of results. Then the 28th page, etc. See how far you get. If you get more than 1000, you should be able to get to the 101st page of results. I’ll give you 5:1 odds that you won’t be able to. Google does have that arbitrary threshold. Try it for a few queries, even. It shouldn’t take you more than a minute or two to really check.
Anyway, I’ll post another constructive (rather than defensive) comment in an hour or two, about recall. But I just had to note that, as you also found in that link defining recall, that recall was not defined by the size of the retrieved set alone (MaxReturned). It is defined as the proportion or percentage of relevant documents, in that set. Which is all I have been saying.
Sorry, Daniel, I guess I’ve misunderstood you. I always thought that you were equating recall with “returned count” (“as far as recall goes: Cuil searches more pages on the Web than anyone else“). Text is sometimes a difficult medium for understanding.
So it seems to me that what we’d want to measure, instead of “recall”, is “coverage”. And maybe “coverage” could be better approximated by the search engine’s statement of the number of “hits” (results 1-10 of about 520,000).
But even there, despite the cutoff=1000 limitation, we have to be able to know how those hits were arrived at. For example, does the search engine do stemming? I.e. when you type the query “cooks”, does the search engine also match pages in the index with the work “cook”? How about “cooking”? If two search engines have the exact same size and content in their index, the exact same coverage of the web, and one engine does stemming and the other does not, then the engine that does stemming is going to show a larger number of “hits” than the other engine. Even though their indexes are equivalent!
We face a similar issue if the search engine does automatic query expansion based on pseudo-relevance feedback. Or if the search engine does automatic query expansion based on latent semantic analysis. In those cases, the engine might automatically add to your “cooks” query the terms “chef”, “food”, “culinary”, etc. When that happens, you will also naturally expand the number of “hits” that the search engine says is available. (Again, with the caveat that the engine still lies and only gives you the top 1000.)
So to answer the question of what engine has better coverage, we have to know how each engine is doing the matching between query and index. Is there stemming? Latent semantic analysis? Pseudo-relevance feedback? Etc?
Cuil does not make use of any user information (such as clickstreams, user profiles, etc.) — at least according to what I’ve read. How could they without an existing user base?
This information, though, is decisive these days to make a competitive search engine and probably the biggest asset that Google, Yahoo, and Microsoft sit on. There is also recent evidence from the research community that it helps a lot (cf. the paper BrowseRank: Letting Web Users Vote for Page Importance from this year’s SIGIR).
Thus, Cuil might remain one of the (last) attempts to enter the search-engine race from scratch, burning piles of venture capital.
I am afraid that race is over. (I’d be happy to be proven wrong, though ;o))
Well, I agree — all the fun is in the discussion. So I guess it’s good that we misunderstood each other 😉
Apparently Cuil was not quite ready for launch during the first day or two – many medium long tail queries did not return results at all, and even general queries returned way fewer results than they should have considering Cuil’s claims of having indexed so many pages already. They did improve somewhat afterward, however, and seem to be picking up more results and increasing relevance as more people have been testing out the engine.
In the long run, I hope they get things together and perform well enough to compete with the major search engines and then maybe do some advertising. I would like to see more serious competitors to Google in order to hold their power in check and encourage more transparency overall.