Re: [PATCH V3 0/5] blk-mq: improvement on handling IO during CPU hotplug

John Garry <john.garry@xxxxxxxxxx> · Fri, 11 Oct 2019 15:10:03 +0100

On 11/10/2019 12:55, Ming Lei wrote:
On Fri, Oct 11, 2019 at 4:54 PM John Garry <john.garry@xxxxxxxxxx> wrote:

On 10/10/2019 12:21, John Garry wrote:

As discussed before, tags of hisilicon V3 is HBA wide. If you switch
to real hw queue, each hw queue has to own its independent tags.
However, that isn't supported by V3 hardware.

I am generating the tag internally in the driver now, so that hostwide
tags issue should not be an issue.

And, to be clear, I am not paying too much attention to performance, but
rather just hotplugging while running IO.

An update on testing:
I did some scripted overnight testing. The script essentially loops like
this:
- online all CPUS
- run fio binded on a limited bunch of CPUs to cover a hctx mask for 1
minute
- offline those CPUs
- wait 1 minute (> SCSI or NVMe timeout)
- and repeat

SCSI is actually quite stable, but NVMe isn't. For NVMe I am finding
some fio processes never dying with IOPS @ 0. I don't see any NVMe
timeout reported. Did you do any NVMe testing of this sort?

Yeah, so for NVMe, I see some sort of regression, like this:
Jobs: 1 (f=1): [_R] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
1158037877d:17h:18m:22s]

I can reproduce this issue, and looks there are requests in ->dispatch.

OK, that may match with what I see:
- the problem occuring coincides with this callpath with 
BLK_MQ_S_INTERNAL_STOPPED set:

blk_mq_request_bypass_insert
(__)blk_mq_try_issue_list_directly
blk_mq_sched_insert_requests
blk_mq_flush_plug_list
blk_flush_plug_list
blk_finish_plug
blkdev_direct_IO
generic_file_read_iter
blkdev_read_iter
aio_read
io_submit_one

blk_mq_request_bypass_insert() adds to the dispatch list, and looking at 
debugfs, could this be that dispatched request sitting:
root@(none)$ more /sys/kernel/debug/block/nvme0n1/hctx18/dispatch
00000000ac28511d {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, 
.tag=56, .internal_tag=-1}

So could there be some race here?

I am a bit busy this week, please feel free to investigate it and debugfs
can help you much. I may have time next week for looking this issue.

OK, appreciated

John

Thanks,
Ming Lei