Re: [PATCH 3/4] blk-mq: deal with shared queue mapping reliably

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 17 Dec 2018 09:04:08 +0800

On Sun, Dec 16, 2018 at 12:12:17PM -0700, Jens Axboe wrote:
> On 12/16/18 11:39 AM, Mike Snitzer wrote:
> > On Sun, Dec 16 2018 at 11:16am -0500,
> > Christoph Hellwig <hch@xxxxxx> wrote:
> > 
> >> On Sun, Dec 16, 2018 at 10:25:16AM +0800, Ming Lei wrote:
> >>> This patch sets map->nr_queues as zero explictly if there is zero
> >>> queues for such queue type, then blk_mq_map_swqueue() can become
> >>> more robust to deal with shared mappings.
> >>
> >> This looks a lot more clumsy than what we had before, can you explain
> >> what additional robustnes it buys us?
> > 
> > It enables nvme IO to complete on my testbed with for-4.21/block
> > changes, this NUMA layout is what triggered Ming's work:
> > 
> > # numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 2 4 6 8 10 12 14
> > node 0 size: 128605 MB
> > node 0 free: 128092 MB
> > node 1 cpus: 1 3 5 7 9 11 13 15
> > node 1 size: 128997 MB
> > node 1 free: 128529 MB
> > node distances:
> > node   0   1
> >   0:  10  21
> >   1:  21  10
> > 
> > Without the aggregate changes from this patchset (1-3 anyway) I get IO
> > hangs in blkdev_fsync().
> 
> Still puzzled. I'm not against the change, but the commit message has
> NOTHING in the way of justification. "Make X more robust" doesn't
> mean anything. Your followup brings no extra info to the table in
> terms of what the bug is here. What exact bug is it fixing? Why is
> fsync currently hanging?

As I explained in previous mail, poll queue uses new mapping via
blk_mq_map_queues(), then blk_mq_map_swqueue() may over-write hctx->type
by this new mapping, finally the queue mapping is totally broken.

Thanks,
Ming