On Thu, Sep 21, 2006 at 08:46:41PM -0700, Luke Lonergan wrote: > Mark, > > On 9/21/06 8:40 PM, "mark@xxxxxxxxxxxxxx" <mark@xxxxxxxxxxxxxx> wrote: > > > I'd advise against using this call unless it can be shown that the page > > will not be used in the future, or at least, that the page is less useful > > than all other pages currently in memory. This is what the call really means. > > It means, "There is no value to keeping this page in memory". > > Yes, it's a bit subtle. > > I think the topic is similar to "cache bypass", used in cache capable vector > processors (Cray, Convex, Multiflow, etc) in the 90's. When you are > scanning through something larger than the cache, it should be marked > "non-cacheable" and bypass caching altogether. This avoids a copy, and > keeps the cache available for things that can benefit from it. > > WRT the PG buffer cache, the rule would have to be: "if the heap scan is > going to be larger than "effective_cache_size", then issue the > posix_fadvise(BLOCK_NOT_NEEDED) call". It doesn't sound very efficient to > do this in block/extent increments though, and it would possibly mess with > subsets of the block space that would be re-used for other queries. Another issue is that if you start two large seqscans on the same table at about the same time, right now you should only be issuing one set of reads for both requests, because one of them will just pull the blocks back out of cache. If we weren't caching then each query would have to physically read (which would be horrid). There's been talk of adding code that would have a seqscan detect if another seqscan is happening on the table at the same time, and if it is, to start it's seqscan wherever the other seqscan is currently running. That would probably ensure that we weren't reading from the table in 2 different places, even if we weren't caching. -- Jim Nasby jim@xxxxxxxxx EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)