On Mon, 19 May 2008, Decibel! wrote:
Hrm... don't seqscans use a separate set of buffers in 8.3? While technically those won't be evicted until needed, you're unlikely to find stuff hanging around there for terribly long...
Not quite a separate set, and I have been remiss that I only skim over that in the presentation right now. Keep meaning to write down those details, and now that you ask that time is now. A quick read of the code suggests that in 8.3, the following rules apply:
1) If you are doing a scan where the table is larger than (shared_buffers/4) or you are doing a VACUUM, your backend gets allocated a ring list it keeps track of the buffers it has requested in.
2) The ring size is 8K * min(shared_buffers/8,32) which for any non-trivial buffer size cases is 256K. As you request pages they get added to the list of ones in the ring. Here shared_buffers is specified as in older versions, as a count of 8K buffers.
3) Once the ring is full and you circle around to a page that's already been used, if its usage count is <=1 (which means that nobody else has used it since it was put in there) that page will get evicted and then re-used rather than allocating a new one in the normal fashion.
4) If someone else is using the buffer, instead a new one is allocated the normal way and it replaces the original entry in the ring.
So, yes, in 8.3 it's possible that you can have sequential scans of large tables or the VACUUM data pass through the buffer cache, but not remain in it afterwards. I didn't think George would ever run into this in the specific example he asked about because of (1). This behavior only kicks in if you're scanning a table large relative to the total shared buffer cache and that didn't seem like an issue in his case.
-- * Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD