On Wed, Oct 28, 2015 at 8:43 AM, Tom Dearman <tom.dearman@xxxxxxxxx> wrote: > We have a performance problem when our postgres is under high load. The CPU > usage is very low, we have 48 cores for our postgres and the idle time > averages at 90%. The problem is we get spikes in our transaction times > which don’t appear with any obvious regularity and when we get the larger > spikes, if I look in the postgres log we see that there is locking on > 'process 41915 acquired ExclusiveLock on extension of relation 27177 of > database 26192’. The actual relation changes one time it might be one table > and another time another, though they are always big tables. I have looked > at various previous threads and the only suggestions are either that the > disk io is maxed out, which from our observations we don’t believe is the > case for us, What are those observations? Keep in mind that if 20 processes are all trying to extend the relation at the same time, one will block on IO (according to top/sar/vmstat etc.) and the other 19 will block on that first one on a PostgreSQL heavy-weight lock. So all 20 of them are effectively blocked on IO, but system monitoring tools won't know that. Also, the IO spikes will be transient, so any monitoring that over-averages will not pick up on them. > or that ‘shared_buffers’ is to large - so we have reduced this > right down to 1G. In the previous threads there was an indication that the > underlying problem was a lock which I believe has been either removed or > much improved in 9.5 (see Lock scalability improvements), however we have > not seen any improvement in the relation extension locking problem that we > see. The version of 9.5 that we have tested is beta1. Any help in showing > us how to improve this would be greatly appreciated. I don't believe any of the improvements made were to this area. Your latency spikes seem to be happening at a 20 minute interval. That would make me think they are lined up with end-of-checkpoint fsync activity, except those should be happening every 5 minutes as your conf has not changed checkpoint_timeout away from the default. Since you have log_checkpoints on, what do you see in the log files about how often they occur, and what the checkpoint write time, sync time, etc. are? Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general