On 2023/6/6 02:31, Tejun Heo wrote: > Hello, > > On Thu, Jun 01, 2023 at 01:39:19PM +0800, chengming.zhou@xxxxxxxxx wrote: >> From: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> >> >> iocost rely on rq start_time_ns and alloc_time_ns to tell the saturation >> state of the block device. >> >> If any qos ->throttle() end up blocking, the cached rq start_time_ns and >> alloc_time_ns will include its throtted time, which can confuse its user. > > I don't follow. rq_qos_throttle() happens before a request is allocated, so > whether ->throttle() blocks or not doesn't affect alloc_time_ns or > start_time_ns. Yes, most of the time request is allocated after rq_qos_throttle() and its alloc_time_ns or start_time_ns won't be affected clearly. But for plug batched allocation introduced by the commit 47c122e35d7e ("block: pre-allocate requests if plug is started and is a batch"), we can rq_qos_throttle() after the allocation of the request. This is what the blk_mq_get_cached_request() does. In this case, the cached request alloc_time_ns or start_time_ns is much ahead if block in any qos ->throttle(). > >> This patch add nr_flush counter in blk_plug, so we can tell if the task >> has throttled in any qos ->throttle(), in which case we need to correct >> the rq start_time_ns and alloc_time_ns. >> >> Another solution may be make rq_qos_throttle() return bool to indicate >> if it has throttled in any qos ->throttle(). But this need more changes. >> >> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > > Depending on the flush behavior and adjusting alloc_time_ns seems fragile to > me and will likely confuse other users of alloc_time_ns too. I agree with you, this code is not good. My basic idea is to adjust the cached request alloc_time_ns and start_time_ns when throttled. > > Maybe I'm misunderstanding the problem you're describing. Can you give a > concrete example of how the current code would misbehave? > I tried using fio to reproduce it: 1. set the iocost qos (a bit strict qos setting to reproduce throttle) echo "259:0 enable=1 rpct=5 rlat=500 wpct=5 wlat=500" > io.cost.qos 2. run fio using io_uring ioengine (for now only io_uring used batched allocation) fio --name global --runtime 30 --time_based --size 10G --ioengine io_uring \ --iodepth 256 --buffered 0 --sqthread_poll \ --name job1 --rw read --cgroup job1 --numjobs 10 \ --name job2 --rw write --cgroup job2 --numjobs 10 3. run bpftrace to check request start_time_ns bpftrace -e 'kprobe:__rq_qos_track { $rq = (struct request *)arg1; if ($rq->start_time_ns) { @delta = hist((nsecs - $rq->start_time_ns)/1000); } }' If we go blk_mq_get_cached_request() -> throttle() and throttled for some time, then the returned cached request start_time_ns will be much ahead. Like below: (delta value is us) @delta: [0] 170090 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1] 898 | | [2, 4) 418 | | [4, 8) 284 | | [8, 16) 54 | | [16, 32) 198 | | [32, 64) 5416 |@ | [64, 128) 5082 |@ | [128, 256) 1296 | | [256, 512) 23 | | [512, 1K) 2632 | | [1K, 2K) 21143 |@@@@@@ | [2K, 4K) 26349 |@@@@@@@@ | [4K, 8K) 4559 |@ | [8K, 16K) 4273 |@ | [16K, 32K) 14 | |