Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

Tom Dearman <tom.dearman@xxxxxxxxx> · Mon, 2 Nov 2015 15:32:08 +0000

Thanks for the prompt replies so far, I have done some more investigation to be able to clearly answer some of the question.
The original shared-buffers was 8G and I have done another run on Friday using this old value instead of my more recent 1G limit.  There was no noticeable improvement.  I also installed the extension pg_buffercache and following some articles such as:

 http://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers/

using:

SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
 FROM pg_class c
 INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
 INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
 WHERE usagecount >= 3;

My system under load is using just over 500M of the shared_buffer at usage count >= 3.  Our system is very write heavy, with all of the big tables written to but not read from (at least during the load test run).  Although our db will grow (under load) to 100G in a few hours and keep growing, the data in shared_buffers - according to my observations above - seems low.  We have the WAL on a different disk from the main tables.  We have the following representative TPS for the disks (from SAR) when under our load test:

06:34:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
06:35:01 PM    dev8-0    176.15     25.89   8773.98     49.96      0.83      4.73      3.79     66.85
06:35:01 PM    dev8-1    174.74      0.13   8746.49     50.06      0.81      4.64      3.82     66.81
06:35:01 PM    dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:35:01 PM    dev8-5      1.41     25.76     27.49     37.74      0.02     14.96      2.31      0.33
06:35:01 PM   dev8-16     15.02      0.00   3994.82    265.95      1.27     84.88      2.76      4.14
06:35:01 PM   dev8-17     15.02      0.00   3994.82    265.95      1.27     84.88      2.76      4.14

dev8-0 to dev8-5 are the represent the disk where the WAL is.  dev8-16 to dev8-17 are where the main postgres data directory is located.

We have run bonnie against these WAL disks and the SAR log for that show the following (examples for heavy write and a heavy read and write):

heavy write
01:28:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
01:29:01 PM    dev8-0    371.85     12.11 377771.45   1015.97    122.93    330.46      2.69    100.00
01:29:01 PM    dev8-1    371.85     12.11 377771.45   1015.97    122.93    330.46      2.69    100.00
01:29:01 PM    dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:29:01 PM    dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:29:01 PM   dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:29:01 PM   dev8-17      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

heavy read/write
12:56:02 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:57:01 PM    dev8-0    456.32 355622.67 109942.01   1020.26     58.34    126.70      2.17     99.00
12:57:01 PM    dev8-1    456.32 355622.67 109942.01   1020.26     58.34    126.70      2.17     99.00
12:57:01 PM    dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:57:01 PM    dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:57:01 PM   dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:57:01 PM   dev8-17      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

To me this shows that we are not IO limited at least it is not obvious to me how we can be.  I have also, previously, run our load test (purely a sanity check) where I had the WAL written to RAM disk, just to be sure that the IO was not a bottleneck, and we still had the same problem with waiting on exclusive locks.

As far as the checkpoint goes, it does happen every 5 minutes and takes about 4.5 mins which corresponds to the 0.9 checkpoint_completion_target we have set.

On 28 Oct 2015, at 19:20, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:

On Wed, Oct 28, 2015 at 8:43 AM, Tom Dearman <tom.dearman@xxxxxxxxx> wrote:
We have a performance problem when our postgres is under high load.  The CPU
usage is very low, we have 48 cores for our postgres and the idle time
averages at 90%.  The problem is we get spikes in our transaction times
which don’t appear with any obvious regularity and when we get the larger
spikes, if I look in the postgres log we see that there is locking on
'process 41915 acquired ExclusiveLock on extension of relation 27177 of
database 26192’.  The actual relation changes one time it might be one table
and another time another, though they are always big tables.  I have looked
at various previous threads and the only suggestions are either that the
disk io is maxed out, which from our observations we don’t believe is the
case for us,

What are those observations?  Keep in mind that if 20 processes are
all trying to extend the relation at the same time, one will block on
IO (according to top/sar/vmstat etc.) and the other 19 will block on
that first one on a PostgreSQL heavy-weight lock.  So all 20 of them
are effectively blocked on IO, but system monitoring tools won't know
that.

Also, the IO spikes will be transient, so any monitoring that
over-averages will not pick up on them.

or that ‘shared_buffers’ is to large - so we have reduced this
right down to 1G.  In the previous threads there was an indication that the
underlying problem was a lock which I believe has been either removed or
much improved in 9.5 (see Lock scalability improvements), however we have
not seen any improvement in the relation extension locking problem that we
see.  The version of 9.5 that we have tested is beta1.  Any help in showing
us how to improve this would be greatly appreciated.

I don't believe any of the improvements made were to this area.

Your latency spikes seem to be happening at a 20 minute interval.
That would make me think they are lined up with end-of-checkpoint
fsync activity, except those should be happening every 5 minutes as
your conf has not changed checkpoint_timeout away from the default.
Since you have log_checkpoints on, what do you see in the log files
about how often they occur, and what the checkpoint write time, sync
time, etc. are?

Cheers,

Jeff