Dne 15.11.2011 01:13, Cody Caughlan napsal(a): > The first two are what I would think would be largely read operations > (certainly the SELECT) so its not clear why a SELECT consumes write > time. > > Here is the output of some pg_stat_bgwriter stats from the last couple of hours: > > https://gist.github.com/41ee26caca01471a9b77 Hmm, the difference between 22:00 and 23:00 seems a bit suspucious. There are 11 checkpoints (timed + requested), i.e. 352 segments, i.e. about 5632 MB of WAL data. The checkpoints had to write 504.135 buffers (i.e. 4 GB of data) and background writer additional 10.494 buffers (100MB). But the backends themselves had to write 173.208 buffers. That's way too much I guess and it's probably one of the reasons why the queries take so long. So your primary goal probably should be to write less buffers from the backends. Spread checkpoints are the most effective way and background writer is fine. Try to eliminate the requested checkpoints (increase the number of checkpoint segments), and eliminate the buffers written by backends. Increase the shared buffers and watch if that helps. If that does not help, make the background writer a bit more aggressive. Increase bgwriter_lru_maxpages / decrease bgwriter_delay, that should write the buffers a bit more aggressive. But if there was something extraordinary happening between 22:00 and 23:00 (e.g. a backup, a batch job), this may be completely nonsense. > One thing that I might not have made very clear earlier is that this > DB, especially a single table receives a very large number of UPDATEs. > However, it seems to be well cached, I have shared_buffers = 3584MB > and a view of pg_buffercache shows: > https://gist.github.com/53c520571290cae14613 That means almost 100% of the buffers is used. But this is not a very interesting view - the buffer may be occupied and not used. Can you group the buffers by "isdirty" so we can see what portion of the buffers was modified (and needs to be written)? Much more interesting is the view from the other side - how many requests were handled from shared buffers. For a database you can do that like this select datname, 100* blks_hit / (blks_hit + blks_read + 1) from pg_stat_database; The "+1" is there just because of zero values, and you should evaluate that using two snapshots (difference of). And you can compute the same thing (cache hit ratio) for tables/indexes. The problem here is that there's a separate cache (page cache), and that's not included here. So a read may be a "miss" and yet not require an actual I/O. > Is it possible that we're just trying to handle too many UPDATEs and > they are all trying to hit disk all at once - causing this I/O Not sure what you mean by "trying to hit disk all at once"? The updates are written to a WAL (which is mostly sequential I/O) and the actual pages are updated in memory (shared buffers). And those shared buffers are written only when a checkpoint happens, but this is a single write of the whole block, not many small writes. Sure, the clients need to grab a lock on a buffer when modifying it, and a lock on WAL, but that wouldn't demonstrate as an I/O utilization. In short - I don't think this is happening here. What actually matters here is that the dirty buffers are spread across the drive - that's where the random I/O comes from. And the fact that the backends need to flush the dirty buffers on their own. > contention? Here is a view of pg_stat_user_tables that shows the > amount of live/dead tuples: > > https://gist.github.com/5ac1ae7d11facd72913f What could happen due to UPDATES is a table bloat (table growing due to updates), but I think that's not happening there. The number of dead tuples is very low - less than 1% in most cases. For example the largest table "users" has less than 0.5% of dead tuples and most of the updates are handled by HOT. So I don't think this is an issue here. kind regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance