ok sorry,
I tried the below, but i could not simulate WALWriteLock waits.
on ubuntu, create a slow loopback device, mount waldir on it and do commits to check for WALWriteLock
580 dd if=/dev/zero of=/var/tmp/postgres bs=1024k count=100
581 losetup --show --find /var/tmp/postgres # gives loop8
582 echo "0 `blockdev --getsz /dev/loop8` delay /dev/loop8 0 1000" | dmsetup create dm-slow # upto 1000ms delay
583 ll /dev/mapper/dm-slow
584 mkfs.ext4 /dev/mapper/dm-slow #format
585 su - postgres
586 mkdir -p /mnt/slow
587 mount /dev/mapper/dm-slow /mnt/slow
588 ls -l /mnt/slow
589 mkdir /mnt/slow/postgres
590 chown -R postgres /mnt/slow/postgres
591 initdb -D data -X /mnt/slow/postgres/data
set WALdir on slow /mnt/slow/postgres
start pg
pg_ctl -D data -l /tmp/logfile start
monitor disk io to verify slow wal commit due to io wait
iostat -x 1 -p dm-0 -p loop8
create table foo(id int)
for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done
inserts are fast, but commit would write to WAL, would be slow.
verified by iostat.
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 100.00 0.00 0.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 2028.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.03 100.40
loop8 0.00 0.00 0.00 0.00 0.00 0.00 2.00 12.00 0.00 0.00 5.50 6.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 1.60
but i could not see any WALWriteLocks in pg_locks during the window.
* To read XLogCtl->LogwrtResult, you must hold either info_lck or
* WALWriteLock. To update it, you need to hold both locks. The point of
* this arrangement is that the value can be examined by code that already
* holds WALWriteLock without needing to grab info_lck as well. In addition
* to the shared variable, each backend has a private copy of LogwrtResult,
* which is updated when convenient.
*
* The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
* (protected by info_lck), but we don't need to cache any copies of it.
*
* info_lck is only held long enough to read/update the protected variables,
* so it's a plain spinlock. The other locks are held longer (potentially
* over I/O operations), so we use LWLocks for them. These locks are:
*
* WALBufMappingLock: must be held to replace a page in the WAL buffer cache.
* It is only held while initializing and changing the mapping. If the
* contents of the buffer being replaced haven't been written yet, the mapping
* lock is released while the write is done, and reacquired afterwards.
*
* WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or
* XLogFlush).
/*
* Wait for any WAL insertions < upto to finish.
*
* Returns the location of the oldest insertion that is still in-progress.
* Any WAL prior to that point has been fully copied into WAL buffers, and
* can be flushed out to disk. Because this waits for any insertions older
* than 'upto' to finish, the return value is always >= 'upto'.
*
* Note: When you are about to write out WAL, you must call this function
* *before* acquiring WALWriteLock, to avoid deadlocks. This function might
* need to wait for an insertion to finish (or at least advance to next
* uninitialized page), and the inserter might need to evict an old WAL buffer
* to make room for a new one, which in turn requires WALWriteLock.
*/
so i may be wrong in what i suggested.
Sorry, I guess the experts will have to weigh in.
apologies for diversion.
Thanks,
Vijay
On Thu, 29 Apr 2021 at 22:35, Vijaykumar Jain <vijaykumarjain.github@xxxxxxxxx> wrote:
I guess details on io waits/ iops saturation etc metrics would need to be ruled out for further discussion.Do the dashboards wrt above metrics look ok?We had vms with unlimited iops and no synchronous replication, so I do not recall this piling up of locks issues atleast till 1tb dbs on ssds till pg11.Googling does mention some things wrt tuning of wal buffers etc, but I believe ruling out resource exhaustion is important.On Thu, 29 Apr 2021 at 9:42 PM Don Seiler <don@xxxxxxxxx> wrote:On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.
In that case, use a connection pool to limit the number of active connections.We do have pgbouncer in place already.Thanks for the replies so far.What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there?