can we reduce bio_set_dev overhead due to bio_associate_blkg?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Tejun and Dennis,

I recently found that due to bio_set_dev()'s call to
bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal;
especially when doing 4K IOs via io_uring's HIPRI bio-polling.

I'm very naive about blk-cgroups.. so I'm hopeful you or others can
help me cut through this to understand what the ideal outcome should
be for DM's bio clone + remap heavy use-case as it relates to
bio_associate_blkg.

If I hack dm-linear with a local __bio_set_dev that simply removes
the call to bio_associate_blkg() my IOPS go from ~980K to 995K.

Looking at what is happening a bit, relative to this DM bio cloning
usecase, it seems __bio_clone() calls bio_clone_blkg_association() to
clone the blkg from DM device, then dm-linear.c:linear_map's call
to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css
but then it triggers an update because the bdev is being remapped in
the bio (due to linear_map sending the IO to the real underlying
device). End result _seems_ like collective wasteful effort to get the
blk-cgroup resources setup properly in the face of a simple remap.

Seems the current DM pattern is causing repeat blkg work for _every_
remapped bio?  Do you see a way to speed up repeat calls to
bio_associate_blkg()?

Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0
kernel should be fine too):
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19

I'm using dm-linear ontop on a 16G blk-mq null_blk device:

modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16
SIZE=`blockdev --getsz /dev/nullb0`
echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear

And running the workload with fio using this wrapper script:
io_uring.sh 20 1 /dev/mapper/linear 4096

#!/bin/bash

RTIME=$1
JOBS=$2
DEV=$3
BS=$4

QD=64
BATCH=16
HI=1

fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \
        --iodepth=$QD \
        --iodepth_batch_submit=$BATCH \
        --iodepth_batch_complete_min=$BATCH \
        --filename=$DEV \
        --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \
        --name=test --group_reporting



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux