Your experiment made far too many assumptions and the data does not stand up to scrutiny. On 1/18/06, Alessandro Baretta <a.baretta@xxxxxxxxxxxxxxx> wrote: > Results: I'll omit the numerical data, which everyone can easily obtain in only > a few minutes, repeating the experiment. I used several query strings containing > very common words ("linux debian", "linux kernel", "linux tux"), each yielding > millions of results. I set Google to retrieve 100 results per page. Then I ran > the query and paged through the data set. The obvious result is that execution > time is a monotonously growing function of the page number. This clearly > indicates that Google does not use any algorithm of the proposed kind, but > rather an OFFSET/LIMIT strategy, thus disproving the hypothesis. I just ran the same test and I got a different outcome than you. The last page came back twice as fast as page 4. I noticed no trend in the speed of the results from each page. Of course it is probably in cache because its such a common thing to be searched on so the experiment is pointless. You cannot jump to your conclusions based on a few searches on google. > It must also be noted that Google refuses to return more than 1000 results per > query, thus indicating that the strategy the adopted quite apparently cannot > scale indefinitely, for on a query returning a potentially flooding dataset, a > user paging through the data would experience a linear slowdown on the number of > pages already fetched, and the DBMS workload would also be linear on the number > of fetched pages. There are various reason why google might want to limit the search result returned ie to encourage people to narrow their search. Prevent screen scrapers from hitting them really hard blah blah. Perhaps less than 0.00000001% of real users (not scrapers) actually dig down to the 10th page so whats the point. There are numerous methods that you can use to give separate result pages some of which include going back to the database and some don't. I prefer not to go back to the database if I can avoid it and if all you want to do is provide a few links to further pages of results then going back to the database and using offsets is a waste of IO. -- Harry http://www.hjackson.org http://www.uklug.co.uk