Re: WALWriteLocks

Vijaykumar Jain <vijaykumarjain.github@xxxxxxxxx> · Fri, 30 Apr 2021 00:13:48 +0530

ok sorry,I tried the below, but i could not simulate WALWriteLock waits.

on ubuntu, create a slow loopback device, mount waldir on it and do commits to check for WALWriteLock

  580  dd if=/dev/zero of=/var/tmp/postgres bs=1024k count=100
  581  losetup --show --find /var/tmp/postgres # gives loop8
  582  echo "0 `blockdev --getsz /dev/loop8` delay /dev/loop8 0 1000" | dmsetup create dm-slow  # upto 1000ms delay
  583  ll /dev/mapper/dm-slow 
  584  mkfs.ext4 /dev/mapper/dm-slow #format
  585  su - postgres
  586  mkdir -p /mnt/slow
  587  mount /dev/mapper/dm-slow /mnt/slow
  588  ls -l /mnt/slow
  589  mkdir /mnt/slow/postgres
  590  chown -R postgres /mnt/slow/postgres
  591 initdb -D data -X /mnt/slow/postgres/data

set WALdir on slow  /mnt/slow/postgres

start pg
pg_ctl -D data -l /tmp/logfile start

monitor disk io to verify slow wal commit due to io wait
iostat -x 1 -p dm-0 -p loop8

create table foo(id int)

 for i in {1..10}; do  psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

inserts are fast, but commit would write to WAL, would be slow.

verified by iostat.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00  100.00    0.00    0.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
dm-0             0.00      0.00     0.00   0.00    0.00     0.00    1.00      0.00     0.00   0.00 2028.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    2.03 100.40
loop8            0.00      0.00     0.00   0.00    0.00     0.00    2.00     12.00     0.00   0.00    5.50     6.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02   1.60

but i could not see any WALWriteLocks in pg_locks during the window.

from postgres/xlog.c at master · postgres/postgres (github.com)

* To read XLogCtl->LogwrtResult, you must hold either info_lck or
 * WALWriteLock.  To update it, you need to hold both locks.  The point of
 * this arrangement is that the value can be examined by code that already
 * holds WALWriteLock without needing to grab info_lck as well.  In addition
 * to the shared variable, each backend has a private copy of LogwrtResult,
 * which is updated when convenient.
 *
 * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
 * (protected by info_lck), but we don't need to cache any copies of it.
 *
 * info_lck is only held long enough to read/update the protected variables,
 * so it's a plain spinlock.  The other locks are held longer (potentially
 * over I/O operations), so we use LWLocks for them.  These locks are:
 *
 * WALBufMappingLock: must be held to replace a page in the WAL buffer cache.
 * It is only held while initializing and changing the mapping.  If the
 * contents of the buffer being replaced haven't been written yet, the mapping
 * lock is released while the write is done, and reacquired afterwards.
 *
 * WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or
 * XLogFlush).

/*
 * Wait for any WAL insertions < upto to finish.
 *
 * Returns the location of the oldest insertion that is still in-progress.
 * Any WAL prior to that point has been fully copied into WAL buffers, and
 * can be flushed out to disk. Because this waits for any insertions older
 * than 'upto' to finish, the return value is always >= 'upto'.
 *
 * Note: When you are about to write out WAL, you must call this function
 * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
 * need to wait for an insertion to finish (or at least advance to next
 * uninitialized page), and the inserter might need to evict an old WAL buffer
 * to make room for a new one, which in turn requires WALWriteLock.
 */

so i may be wrong in what i suggested.
Sorry, I guess the experts will have to weigh in.

apologies for diversion.

Thanks,
Vijay

On Thu, 29 Apr 2021 at 22:35, Vijaykumar Jain <vijaykumarjain.github@xxxxxxxxx> wrote:
I guess details on io waits/ iops saturation etc metrics would need to be ruled out for further discussion.

Do the dashboards wrt above metrics look ok? 

We had vms with unlimited iops  and no synchronous replication, so I do not recall this piling up of locks issues atleast till 1tb dbs on ssds till pg11.
Googling does mention some things wrt tuning of wal buffers etc, but I believe ruling out resource exhaustion is important.

On Thu, 29 Apr 2021 at 9:42 PM Don Seiler <don@xxxxxxxxx> wrote:
On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:

My gues is that you have too many active client connections, and you are suffering

from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.
What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there? 

Don.

-- 
Don Seiler
www.seiler.us