Performance regression when large number of tasks perform random I/O

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

When the following fio command is executed,

fio --eta=always --output=runlogs/randwrite4k_64jobs.out -name fio.test
--directory=/data --rw=randwrite --bs=4k --size=4G --ioengine=libaio
--iodepth=16 --direct=1 --time_based=1 --runtime=900 --randrepeat=1
--gtod_reduce=1 --group_reporting=1 --numjobs=64 

on an XFS instance having the following geometry,

# xfs_info /dev/tank/lvm
meta-data=/dev/mapper/tank-lvm   isize=512    agcount=32, agsize=97675376 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=3125612032, imaxpct=5
         =                       sunit=16     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

# dmsetup table
tank-lvm: 0 25004900352 striped 4 128 259:3 2048 259:2 2048 259:5 2048 259:8 2048

# lsblk
nvme0n1      259:0    0   2.9T  0 disk
└─nvme0n1p1  259:3    0   2.9T  0 part
  └─tank-lvm 252:0    0  11.7T  0 lvm  /data
nvme1n1      259:1    0   2.9T  0 disk
└─nvme1n1p1  259:2    0   2.9T  0 part
  └─tank-lvm 252:0    0  11.7T  0 lvm  /data
nvme2n1      259:4    0   2.9T  0 disk
└─nvme2n1p1  259:5    0   2.9T  0 part
  └─tank-lvm 252:0    0  11.7T  0 lvm  /data
nvme3n1      259:6    0   2.9T  0 disk
└─nvme3n1p1  259:8    0   2.9T  0 part
  └─tank-lvm 252:0    0  11.7T  0 lvm  /data
  

... The following results are produced

------------------------------------------------------
 Kernel                                    Write IOPS 
------------------------------------------------------
 v5.4                                      1050K      
 b843299ba5f9a430dd26ecd02ee2fef805f19844  1040k      
 0e7ab7efe77451cba4cbecb6c9f5ef83cf32b36b  835k       
 v5.17-rc4                                 909k       
------------------------------------------------------

The commit 0e7ab7efe77451cba4cbecb6c9f5ef83cf32b36b (xfs: Throttle commits on
delayed background CIL push) causes tasks (which commit transactions to the
CIL) to get blocked (if cil->xc_ctx->space_used >=
XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) until the CIL push worker is executed.

The following procedure seems to indicate that the drop in performance could
be due to large number of tasks being blocked.

1. Insert the following probe point (on v5.17-rc4),
   # perf probe -a 'xlog_cil_push_work:29 dev=log->l_mp->m_super->s_dev:u32
   curr_cycle=log->l_curr_cycle:s32 curr_block=log->l_curr_block:s32'
   --vmlinux=/root/chandan/junk/build/linux/vmlinux

2. Execute the following command line,
   # perf record -e probe:xlog_cil_push_work_L29 -e xfs:xfs_log_cil_wait -g -a
   -- <fio command line>

3. Summarize the perf data using the python program available from
   https://gist.github.com/chandanr/ee9b4f33cb194d61fe885bc7b4180a9b

   # perf script -i perf.data -s perf-script.py
   Maximum number of waiting tasks: 83
   Average number of waiting tasks: 59
   Maximum waiting time: 1.976929619
   Total waiting (secs.nsecs): (38.550612754)

-- 
chandan




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux