Re: [PATCH V6 0/8] blk-mq: improvement CPU hotplug

John Garry <john.garry@xxxxxxxxxx> · Wed, 8 Apr 2020 17:56:48 +0100

On 08/04/2020 16:14, Daniel Wagner wrote:
On Wed, Apr 08, 2020 at 02:29:51PM +0100, John Garry wrote:
On 08/04/2020 14:10, Daniel Wagner wrote:
ok, but to really test this you need to ensure that all the cpus for a
managed interrupt affinity mask are offlined together for some period of
time greater than the IO timeout. Otherwise the hw queue's managed interrupt
would not be shut down, and you're not verifying that the queues are fully
drained.

Hi Daniel,

Not sure if I understand you correctly: Are you saying that the IRQ
related resources are not freed/moved from the offlining CPU?

This series tries to drain the hw queue when all cpus in the queue (IRQ) 
affinity mask are being offlined. This is because when all the cpus are 
offlined, the managed IRQ for that hw queue is shutdown - so there are 
no cpus remaining online to service the completion interrupt for 
in-flight IOs. The cover letter may explain this better.

Will the fio processes migrate back onto cpus which have been onlined again?

Hmm, good question. I've tried to assign them to a specific CPU via
--cpus_allowed_policy=split and --cpus_allowed.

    fio --rw=randwrite --name=test --size=50M --iodepth=32 --direct=1 \
        --bs=4k --numjobs=40 --time_based --runtime=1h --ioengine=libaio \
        --group_reporting --cpus_allowed_policy=split --cpus_allowed=0-40

Though I haven't verified what happens when the CPU get's back online.

Maybe this will work since you're offlining patterns of cpus and the fio
processes have to migrate somewhere. But see above.

At least after the initial setup a fio thread will be migrated away
from the offlining CPU.

A quick test shows, that the affinity mask for a fio thread will be
cleared when the CPU goes offline. There seems to be a discussion
going on about the cpu hotplug and the affinity mask:

https://lore.kernel.org/lkml/1251528473.590671.1579196495905.JavaMail.zimbra@xxxxxxxxxxxx

TL;DR: it can be scheduled back if affinity is tweaked via
e.g. taskset, it won't if it's via cpusets

I just avoid any of this in my test by looping in a sequence of onlining 
all cpus, start fio for short period, and then offline cpus.

BTW, you mentioned earlier that you would test megaraid_sas. As things 
stand, I don't think that series will help there as that driver still 
just exposes a single HW queue to blk-mq. I think qla2xxx driver does 
expose >1 queues, i.e. it sets Scsi_Host.nr_hq_queues, so may be a 
better option.

Cheers,
John