Re: Slow queries / commits, mis-configuration or hardware issues?

"Tomas Vondra" <tv@xxxxxxxx> · Wed, 16 Nov 2011 17:34:10 +0100

On 16 Listopad 2011, 5:27, Greg Smith wrote:
> On 11/14/2011 01:16 PM, Cody Caughlan wrote:
>> We're starting to see some slow queries, especially COMMITs that are
>> happening more frequently. The slow queries are against seemingly
>> well-indexed tables.
>> Slow commits like:
>>
>> 2011-11-14 17:47:11 UTC pid:14366 (44/0-0) LOG:  duration: 3062.784 ms
>>   statement: COMMIT
>> 2011-11-14 17:47:11 UTC pid:14604 (48/0-0) LOG:  duration: 2593.351 ms
>>   statement: COMMIT
>>
>> These slow COMMITs are against tables that received a large number of
>> UPDATEs and are growing fairly rapidly.
>>
> Linux will fill its write cache with all of the writes coming out of
> each checkpoint.  With a 16GB instance, I would expect that 5% * 16GB ~=
> 800MB of writes are batched up when your system is slow.  You should be
> able to confirm that by looking at the "Dirty:" line in /proc/meminfo
>
> With 800MB queued up and I/O that is lucky to get 50MB/s, the sync calls
> at the end of each checkpoint are sometimes blocking for multiple seconds:
>
>> 2011-11-14 17:38:48 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
>> 15121 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 8
>> recycled; write=270.101 s, sync=2.989 s, total=273.112 s; sync
>> files=60, longest=1.484 s, average=0.049 s
>> 2011-11-14 17:39:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
>> 2011-11-14 17:43:49 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
>> 16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9
>> recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync
>> files=82, longest=2.943 s, average=0.050 s
>>
>
> When an individual sync call gets stuck for that long, clients can
> easily get stuck behind it too.  There are a couple of techniques that
> might help:

The sync times I see there seem quite acceptable - 4.2s is not perfect,
but I wouldn't rate it as terrible. What actually annoys me is the amount
of data written - it's just 17000 pages, i.e. about 130 MB for a
checkpoint (spread over 5 minutes). So it's just like 0.5 MB/s.

> -Switch filesystems if you're running a slow one.  ext3 has bad latency
> behavior here, XFS and ext4 are better.

He's using xfs, IIRC. That's one of the better behaving ones, when it
comes to sync.

> -Lower the dirty_* tunables like dirty_background_ratio or its bytes
> version.  This will reduce average throughput, but can lower latency.
> -Spread checkpoints out more so that less average writes are happening.
> -Decrease shared_buffers so less data is getting pushed out at
> checkpoint time.
> -Reduce your reliability expectations and turn off synchronous_commit.

The question here probably is whether those high latencies are caused or
significantly influenced by the checkpoint, or are a "feature" of the
storage. Because if it's a feature, then all this is a futile attempt to
fix it.

I don't think he has problems with checkpoints - he's complaining about
regular queries being slow (even plain SELECT, i.e. something that usually
does not require a sync).

No doubt this may be connected, but a regular SELECT usually does not
perform a sync, right? It may need to fetch some data and if the I/O is
saturated by a checkpoint, this may take time. But again - those bonnie
results were collected with on a running system, i.e. with checkpoints in
progress and all of that.

And I'd expect most of the SELECT queries to be handled without actually
touching the devices, but by connecting
https://gist.github.com/5ac1ae7d11facd72913f and
https://gist.github.com/5ac1ae7d11facd72913f it seems that the larges
table (users) is almost completely in shared buffers, while the two other
large tables (external_user and facebook_friends) are cached by about 30%.
And I'd expect the rest of those tables to be in the page cache, so SELECT
queries on those tables should be fast.

A commit obviously requires a sync on the WAL - I wonder if moving the WAL
would improve the performance here.

This is obviously based on an incomplete set of stats, and maybe I'm
missing something.

> Your server is sometimes showing multi-second latency issues with
> bonnie++ too; that suggests how this problem is not even specific to
> PostgreSQL.  Linux is hard to tune for low latency under all

Don't forget those data were collected on a production system, i.e. it was
actually under load. That probably skews the results a lot.

> circumstances; fighting latency down under a heavy update workload is
> hard to do even with good hardware to accelerate write performance.  In
> an EC2 environment, it may not even be possible to do without making
> trade-offs like disabling synchronous writes.  I can easily get
> transactions hung for 10 to 15 seconds on one of their servers if I try
> to make that problem bad, you're only seeing the middle range of latency
> issues so far.

Are you talking about EBS or ephemeral storage? Because all this is about
ephemeral (something like a "virtualized local" storage).

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance