Re: [PATCH RFC 1/3] block/mq-deadline: Revert "block/mq-deadline: Fix the tag reservation code"

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Wed, 11 Dec 2024 10:57:49 +0800

Hi,

在 2024/12/11 10:38, Zhiguo Niu 写道:
Bart Van Assche <bvanassche@xxxxxxx> 于2024年12月11日周三 04:33写道：

On 12/9/24 10:22 PM, Yu Kuai wrote:
First of all, are we in the agreement that it's not acceptable to
sacrifice performance in the default scenario just to make sure
functional correctness if async_depth is set to 1?

How much does this affect performance? If this affects performance
significantly I agree that this needs to be fixed.

If so, following are the options that I can think of to fix this:

1) make async_depth read-only, if 75% tags will hurt performance in some
cases, user can increase nr_requests to prevent it.
2) refactor elevator sysfs api, remove eq->sysfs_lock and replace it
with q->sysfs_lock, so deadline_async_depth_store() will be protected
against changing hctxs, and min_shallow_depth can be updated here.
3) other options?

Another option is to remove the ability to configure async_depth. If it
is too much trouble to get the implementation right without causing
regressions for existing workloads, one possibility is to remove support
for restricting the number of asynchronous requests in flight.
Hi Bart,
I think it is very useful to restrict asynchronous requests when IO
loading is very heavy by aysnc_depth.
the following is my androidbench experiment in android device(sched_tag=128):
1. setting heavy IO
while true; do fio -directory=/data -direct=0 -rw=write -bs=64M
-size=1G -numjobs=5 -name=fiotest
2. run androidbench  and results：
                 orignial async_depth
async_depth=nr_requests*3/4      delta
seq read             33.176                                216.49
                       183.314
seq write             28.57                                  62.152
                          33.582
radom read         1.518                                  1.648
                         0.13
radom write         3.546                                  4.27
                           0.724
and our customer also feedback there is optimization when they test
APP cold start and benchmark after tunning async_depth.

So do you guys writing async_depth? Looks like you're using
nr_requests*3/4. If this is the case, the above option 1) is still
working for you guys. However, in this test, I think the lower
async_depth is, the better result you'll get.

Thanks,
Kuai

thanks！

Thanks,

Bart.

.