Hi Dave, When the following fio command is executed, fio --eta=always --output=runlogs/randwrite4k_64jobs.out -name fio.test --directory=/data --rw=randwrite --bs=4k --size=4G --ioengine=libaio --iodepth=16 --direct=1 --time_based=1 --runtime=900 --randrepeat=1 --gtod_reduce=1 --group_reporting=1 --numjobs=64 on an XFS instance having the following geometry, # xfs_info /dev/tank/lvm meta-data=/dev/mapper/tank-lvm isize=512 agcount=32, agsize=97675376 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=3125612032, imaxpct=5 = sunit=16 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # dmsetup table tank-lvm: 0 25004900352 striped 4 128 259:3 2048 259:2 2048 259:5 2048 259:8 2048 # lsblk nvme0n1 259:0 0 2.9T 0 disk └─nvme0n1p1 259:3 0 2.9T 0 part └─tank-lvm 252:0 0 11.7T 0 lvm /data nvme1n1 259:1 0 2.9T 0 disk └─nvme1n1p1 259:2 0 2.9T 0 part └─tank-lvm 252:0 0 11.7T 0 lvm /data nvme2n1 259:4 0 2.9T 0 disk └─nvme2n1p1 259:5 0 2.9T 0 part └─tank-lvm 252:0 0 11.7T 0 lvm /data nvme3n1 259:6 0 2.9T 0 disk └─nvme3n1p1 259:8 0 2.9T 0 part └─tank-lvm 252:0 0 11.7T 0 lvm /data ... The following results are produced ------------------------------------------------------ Kernel Write IOPS ------------------------------------------------------ v5.4 1050K b843299ba5f9a430dd26ecd02ee2fef805f19844 1040k 0e7ab7efe77451cba4cbecb6c9f5ef83cf32b36b 835k v5.17-rc4 909k ------------------------------------------------------ The commit 0e7ab7efe77451cba4cbecb6c9f5ef83cf32b36b (xfs: Throttle commits on delayed background CIL push) causes tasks (which commit transactions to the CIL) to get blocked (if cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) until the CIL push worker is executed. The following procedure seems to indicate that the drop in performance could be due to large number of tasks being blocked. 1. Insert the following probe point (on v5.17-rc4), # perf probe -a 'xlog_cil_push_work:29 dev=log->l_mp->m_super->s_dev:u32 curr_cycle=log->l_curr_cycle:s32 curr_block=log->l_curr_block:s32' --vmlinux=/root/chandan/junk/build/linux/vmlinux 2. Execute the following command line, # perf record -e probe:xlog_cil_push_work_L29 -e xfs:xfs_log_cil_wait -g -a -- <fio command line> 3. Summarize the perf data using the python program available from https://gist.github.com/chandanr/ee9b4f33cb194d61fe885bc7b4180a9b # perf script -i perf.data -s perf-script.py Maximum number of waiting tasks: 83 Average number of waiting tasks: 59 Maximum waiting time: 1.976929619 Total waiting (secs.nsecs): (38.550612754) -- chandan