"Mark Mielke" <mark@xxxxxxxxxxxxxx> writes: > Matthew wrote: > >> I don't think you would have to create a more intelligent table scanning >> algorithm. What you would need to do is take the results of the index, >> convert that to a list of page fetches, then pass that list to the OS as >> an asynchronous "please fetch all these into the buffer cache" request, >> then do the normal algorithm as is currently done. The requests would then >> come out of the cache instead of from the disc. Of course, this is from a >> simple Java programmer who doesn't know the OS interfaces for this sort of >> thing. > > That's about how the talk went. :-) > > The problem is that a 12X speed for 12 disks seems unlikely except under very > specific loads (such as a sequential scan of a single table). Each of the > indexes may need to be scanned or searched in turn, then each of the tables > would need to be scanned or searched in turn, depending on the query plan. > There is no guarantee that the index rows or the table rows are equally spread > across the 12 disks. CPU processing becomes involved with is currently limited > to a single processor thread. I suspect no database would achieve a 12X speedup > for 12 disks unless a simple sequential scan of a single table was required, in > which case the reads could be fully parallelized with RAID 0 using standard > sequential reads, and this is available today using built-in OS or disk > read-ahead. I'm sure you would get something between 1x and 12x though... I'm rerunning my synthetic readahead tests now. That doesn't show the effect of the other cpu and i/o work being done in the meantime but surely if they're being evicted from cache too soon that just means your machine is starved for cache and you should add more RAM? Also, it's true, you need to preread more than 12 blocks to handle a 12-disk raid. My offhand combinatorics analysis seems to indicate you would expect to need to n(n-1)/2 blocks on average before you've hit all the blocks. There's little penalty to prereading unless you use up too much kernel resources or you do unnecessary i/o which you never use, so I would expect doing n^2 capped at some reasonable number like 1,000 pages (enough to handle a 32-disk raid) would be reasonable. The real trick is avoiding doing prefetches that are never needed. The user may never actually read all the tuples being requested. I think that means we shouldn't prefetch until the second tuple is read and then gradually increase the prefetch distance as you read more and more of the results. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly