Not resolved. The WAL was already on SSD, we further moved it to a separate SSD (Azure's Ultra SSD) and that seemed to lessen the severity/frequency but we still see occasional spikes over the course of any given day.
What is actually interesting though is now that I've had iostat running, when we see the WALWriteLock spikes, we see the w_await and aqu-sz spike on the data directory disk, not the WAL disk. See below for an example (first two readings for baseline, then spike). sdc is the PG data directory disk (Azure Premium SSD), and dm-0 is an LVM lun wrapped around it. sdd is the WAL disk (Azure Ultra SSD). The data disk is set to ReadOnly host caching per Azure's recommendations, the WAL disk is set to None (per recommendations for log disk plus Ultra SSD doesn't allow any caching anyway).
05/12/2021 01:45:47 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.77 0.00 3.61 1.67 0.00 76.95
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.20 0.00 0.07 0.00 14.60 0.00 82.02 0.00 1.69 0.00 0.00 22.25 3.25 1.04
sdc 1180.00 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.71 5.22 0.77 9.88 7.93 0.67 91.20
sdd 0.00 1290.00 0.00 19.82 0.00 250.00 0.00 16.23 0.00 0.46 0.01 0.00 15.74 0.76 98.56
dm-0 1180.40 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.74 5.30 1.84 9.88 7.93 0.67 91.28
05/12/2021 01:45:52 PM
avg-cpu: %user %nice %system %iowait %steal %idle
18.71 0.00 4.02 1.54 0.00 75.74
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.40 0.00 0.07 0.00 15.00 0.00 81.52 0.00 1.88 0.00 0.00 21.65 3.53 1.20
sdc 1035.20 118.60 10.55 1.05 0.00 26.80 0.00 18.43 0.67 5.27 0.50 10.44 9.07 0.81 93.44
sdd 0.00 1298.00 0.00 21.64 0.00 264.40 0.00 16.92 0.00 0.49 0.01 0.00 17.08 0.76 98.96
dm-0 1035.20 145.40 10.55 1.05 0.00 0.00 0.00 0.00 0.69 5.49 1.52 10.44 7.39 0.79 93.44
05/12/2021 01:45:57 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.85 0.00 4.34 10.93 0.00 66.88
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.80 0.00 0.07 0.00 14.60 0.00 79.35 0.00 2.26 0.00 0.00 20.00 3.58 1.36
sdc 983.20 2831.80 8.98 36.91 0.00 0.00 0.00 0.00 6.61 38.56 108.97 9.35 13.35 0.25 94.80
sdd 0.00 1246.20 0.00 21.24 0.00 249.00 0.00 16.65 0.00 0.49 0.03 0.00 17.45 0.79 98.48
dm-0 982.80 2831.60 8.98 36.91 0.00 0.00 0.00 0.00 6.58 38.66 115.93 9.35 13.35 0.25 94.88
05/12/2021 01:46:02 PM
avg-cpu: %user %nice %system %iowait %steal %idle
19.40 0.00 4.48 2.53 0.00 73.59
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 2.20 4.00 0.04 0.09 0.00 20.00 0.00 83.33 0.36 2.20 0.00 17.45 24.00 2.32 1.44
sdc 915.00 2592.60 8.31 23.10 0.00 44.20 0.00 1.68 1.20 16.35 37.69 9.30 9.13 0.27 94.64
sdd 0.00 1234.20 0.00 24.99 0.00 253.60 0.00 17.05 0.00 0.54 0.02 0.00 20.74 0.80 99.20
dm-0 915.60 2636.80 8.31 23.10 0.00 0.00 0.00 0.00 1.20 16.19 43.80 9.30 8.97 0.27 94.64
avg-cpu: %user %nice %system %iowait %steal %idle
17.77 0.00 3.61 1.67 0.00 76.95
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.20 0.00 0.07 0.00 14.60 0.00 82.02 0.00 1.69 0.00 0.00 22.25 3.25 1.04
sdc 1180.00 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.71 5.22 0.77 9.88 7.93 0.67 91.20
sdd 0.00 1290.00 0.00 19.82 0.00 250.00 0.00 16.23 0.00 0.46 0.01 0.00 15.74 0.76 98.56
dm-0 1180.40 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.74 5.30 1.84 9.88 7.93 0.67 91.28
05/12/2021 01:45:52 PM
avg-cpu: %user %nice %system %iowait %steal %idle
18.71 0.00 4.02 1.54 0.00 75.74
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.40 0.00 0.07 0.00 15.00 0.00 81.52 0.00 1.88 0.00 0.00 21.65 3.53 1.20
sdc 1035.20 118.60 10.55 1.05 0.00 26.80 0.00 18.43 0.67 5.27 0.50 10.44 9.07 0.81 93.44
sdd 0.00 1298.00 0.00 21.64 0.00 264.40 0.00 16.92 0.00 0.49 0.01 0.00 17.08 0.76 98.96
dm-0 1035.20 145.40 10.55 1.05 0.00 0.00 0.00 0.00 0.69 5.49 1.52 10.44 7.39 0.79 93.44
05/12/2021 01:45:57 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.85 0.00 4.34 10.93 0.00 66.88
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.80 0.00 0.07 0.00 14.60 0.00 79.35 0.00 2.26 0.00 0.00 20.00 3.58 1.36
sdc 983.20 2831.80 8.98 36.91 0.00 0.00 0.00 0.00 6.61 38.56 108.97 9.35 13.35 0.25 94.80
sdd 0.00 1246.20 0.00 21.24 0.00 249.00 0.00 16.65 0.00 0.49 0.03 0.00 17.45 0.79 98.48
dm-0 982.80 2831.60 8.98 36.91 0.00 0.00 0.00 0.00 6.58 38.66 115.93 9.35 13.35 0.25 94.88
05/12/2021 01:46:02 PM
avg-cpu: %user %nice %system %iowait %steal %idle
19.40 0.00 4.48 2.53 0.00 73.59
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 2.20 4.00 0.04 0.09 0.00 20.00 0.00 83.33 0.36 2.20 0.00 17.45 24.00 2.32 1.44
sdc 915.00 2592.60 8.31 23.10 0.00 44.20 0.00 1.68 1.20 16.35 37.69 9.30 9.13 0.27 94.64
sdd 0.00 1234.20 0.00 24.99 0.00 253.60 0.00 17.05 0.00 0.54 0.02 0.00 20.74 0.80 99.20
dm-0 915.60 2636.80 8.31 23.10 0.00 0.00 0.00 0.00 1.20 16.19 43.80 9.30 8.97 0.27 94.64
On Mon, May 24, 2021 at 9:13 AM Vijaykumar Jain <vijaykumarjain.github@xxxxxxxxx> wrote:
Hey Don,Was this resolved?We're there any changes after moving wal to ssd ?On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
> to a WALWriteLock (eg writing to a data file or the postgresql.log file)
> or is it 100% writing to the WAL file.
You don't contribute to a lock...
The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):
> WALWrite Waiting for WAL buffers to be written to disk.
So it is contention while writing or flushing WAL.
You can read the code for the details.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Don Seiler
www.seiler.us
www.seiler.us