On 06/28/2011 07:50 PM, Craig McIlwee wrote:
I was thinking that shared buffers controlled the amount of data,
primarily table and index pages, that the database could store in
memory at once. Based on that assumption, I thought that a larger
value would enable an entire table + index to be in memory together
and speed up the query. Am I wrong?
It does to some extent. But:
a) This amount doesn't impact query planning as much if you've set a
large effective_cache_size
b) The operating system is going to cache things outside of PostgreSQL, too
c) Data read via a sequential scan sometimes skips going into
shared_buffers, to keep that cache from being swamped with any single scan
d) until the data has actually made its way into memory, you may be
pulling it in there by an inefficient random process at first. By the
time the cache is populated, the thing you wanted a populated cache to
accelerate may already have finished.
It's possible to get insight into this all using pg_buffercache to
actually see what's in the cache, and I've put up some talks and scripts
to help with that at http://projects.2ndquadrant.com/talks you might
find useful.
Looks like they are set to adaptive read-ahead now. If the database
is executing many concurrent queries, is it reasonable to suspect that
the IO requests will compete with each other in such a way that the
controller would rarely see many sequential requests since it is
serving many processes? The controller does have an 'on' option also
that forces read-ahead, maybe that would solve the issue if we can
rely on the data to survive in the cache until the actual read request
takes place.
I've never been able to find good documentation on just what the
difference between the adaptive and on modes of that controller really
are, which is why I suggested you try both and see. Linux has a
uniquely good read-ahead model that was tuned with PostgreSQL
specifically in mind. And you still have to tweak it upwards from the
defaults in order for the database to fetch things as fast as the drives
are capable sometimes. So your idea that you will meet/exceed the
drive's capabilities for bulk sequential scans is less likely than you
might think. RAID5 in theory should give you 2X or more of the speed of
any single disk when reading a clustered table, but the way PostgreSQL
does it may make that hard to realize on Windows.
--
Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance