Re: Slow performance when querying millions of rows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/28/2011 07:50 PM, Craig McIlwee wrote:
I was thinking that shared buffers controlled the amount of data, primarily table and index pages, that the database could store in memory at once. Based on that assumption, I thought that a larger value would enable an entire table + index to be in memory together and speed up the query. Am I wrong?

It does to some extent.  But:

a) This amount doesn't impact query planning as much if you've set a large effective_cache_size

b) The operating system is going to cache things outside of PostgreSQL, too

c) Data read via a sequential scan sometimes skips going into shared_buffers, to keep that cache from being swamped with any single scan

d) until the data has actually made its way into memory, you may be pulling it in there by an inefficient random process at first. By the time the cache is populated, the thing you wanted a populated cache to accelerate may already have finished.

It's possible to get insight into this all using pg_buffercache to actually see what's in the cache, and I've put up some talks and scripts to help with that at http://projects.2ndquadrant.com/talks you might find useful.

Looks like they are set to adaptive read-ahead now. If the database is executing many concurrent queries, is it reasonable to suspect that the IO requests will compete with each other in such a way that the controller would rarely see many sequential requests since it is serving many processes? The controller does have an 'on' option also that forces read-ahead, maybe that would solve the issue if we can rely on the data to survive in the cache until the actual read request takes place.

I've never been able to find good documentation on just what the difference between the adaptive and on modes of that controller really are, which is why I suggested you try both and see. Linux has a uniquely good read-ahead model that was tuned with PostgreSQL specifically in mind. And you still have to tweak it upwards from the defaults in order for the database to fetch things as fast as the drives are capable sometimes. So your idea that you will meet/exceed the drive's capabilities for bulk sequential scans is less likely than you might think. RAID5 in theory should give you 2X or more of the speed of any single disk when reading a clustered table, but the way PostgreSQL does it may make that hard to realize on Windows.

--
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux