Re: Slow queries / commits, mis-configuration or hardware issues?

Tomas Vondra <tv@xxxxxxxx> · Wed, 16 Nov 2011 02:02:08 +0100

Dne 15.11.2011 01:13, Cody Caughlan napsal(a):
> The first two are what I would think would be largely read operations
> (certainly the SELECT) so its not clear why a SELECT consumes write
> time.
> 
> Here is the output of some pg_stat_bgwriter stats from the last couple of hours:
> 
> https://gist.github.com/41ee26caca01471a9b77

Hmm, the difference between 22:00 and 23:00 seems a bit suspucious.
There are 11 checkpoints (timed + requested), i.e. 352 segments, i.e.
about 5632 MB of WAL data. The checkpoints had to write 504.135 buffers
(i.e. 4 GB of data) and background writer additional 10.494 buffers (100MB).

But the backends themselves had to write 173.208 buffers. That's way too
much I guess and it's probably one of the reasons why the queries take
so long.

So your primary goal probably should be to write less buffers from the
backends. Spread checkpoints are the most effective way and background
writer is fine.

Try to eliminate the requested checkpoints (increase the number of
checkpoint segments), and eliminate the buffers written by backends.
Increase the shared buffers and watch if that helps.

If that does not help, make the background writer a bit more aggressive.
Increase bgwriter_lru_maxpages / decrease bgwriter_delay, that should
write the buffers a bit more aggressive.

But if there was something extraordinary happening between 22:00 and
23:00 (e.g. a backup, a batch job), this may be completely nonsense.

> One thing that I might not have made very clear earlier is that this
> DB, especially a single table receives a very large number of UPDATEs.
> However, it seems to be well cached, I have shared_buffers = 3584MB
> and a view of pg_buffercache shows:
> https://gist.github.com/53c520571290cae14613

That means almost 100% of the buffers is used. But this is not a very
interesting view - the buffer may be occupied and not used. Can you
group the buffers by "isdirty" so we can see what portion of the buffers
was modified (and needs to be written)?

Much more interesting is the view from the other side - how many
requests were handled from shared buffers. For a database you can do
that like this

  select datname, 100* blks_hit / (blks_hit + blks_read + 1)
    from pg_stat_database;

The "+1" is there just because of zero values, and you should evaluate
that using two snapshots (difference of). And you can compute the same
thing (cache hit ratio) for tables/indexes.

The problem here is that there's a separate cache (page cache), and
that's not included here. So a read may be a "miss" and yet not require
an actual I/O.

> Is it possible that we're just trying to handle too many UPDATEs and
> they are all trying to hit disk all at once - causing this I/O

Not sure what you mean by "trying to hit disk all at once"? The updates
are written to a WAL (which is mostly sequential I/O) and the actual
pages are updated in memory (shared buffers). And those shared buffers
are written only when a checkpoint happens, but this is a single write
of the whole block, not many small writes.

Sure, the clients need to grab a lock on a buffer when modifying it, and
a lock on WAL, but that wouldn't demonstrate as an I/O utilization.

In short - I don't think this is happening here.

What actually matters here is that the dirty buffers are spread across
the drive - that's where the random I/O comes from. And the fact that
the backends need to flush the dirty buffers on their own.

> contention? Here is a view of pg_stat_user_tables that shows the
> amount of live/dead tuples:
> 
> https://gist.github.com/5ac1ae7d11facd72913f

What could happen due to UPDATES is a table bloat (table growing due to
updates), but I think that's not happening there. The number of dead
tuples is very low - less than 1% in most cases.

For example the largest table "users" has less than 0.5% of dead tuples
and most of the updates are handled by HOT. So I don't think this is an
issue here.

kind regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance