Re: [PATCH 0/3] blk-mq/dm-rq: support BLK_MQ_F_BLOCKING for dm-rq

Mike Snitzer <snitzer@xxxxxxxxxx> · Wed, 19 Jan 2022 16:03:16 -0500

On Mon, Jan 17 2022 at  3:10P -0500,
Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

> On Tue, Jan 11, 2022 at 01:23:53PM -0500, Jeff Moyer wrote:
> > Maybe I have bad taste, but the patches didn't look like cruft to me.
> > :)
> 
> They do to me.  The extend the corner case of request on request
> stacking that already is a bit of mess even more by adding yet another
> special case in the block layer.

Ming's first patch:
https://patchwork.kernel.org/project/dm-devel/patch/20211221141459.1368176-2-ming.lei@xxxxxxxxxx/
is pure cleanup for the mess that went in this merge cycle.

All that dm-rq context aside, Ming's 1st patch is the correct way to
clean up the block core flags/state (internal vs external, etc).

But the 2nd paragraph of that first patch's header should be moved to
Ming's 2nd patch because it explains why DM needs the new
blk_alloc_disk_srcu() interface, e.g.:
"But dm queue is allocated before tagset is allocated"
(so it confirms it best to explain that in 2nd patch).

But really even Ming's 2nd patch:
https://patchwork.kernel.org/project/dm-devel/patch/20211221141459.1368176-3-ming.lei@xxxxxxxxxx/
should _not_ need to be debated like this.

Fact is alloc_disk() has always mirrored blk_alloc_queue()'s args.  So
Ming's 2nd patch should be done anyway to expose meaningful control
over request_queue allocation.  If anything, the 2nd patch should just
add the 'alloc_srcu' arg to blk_alloc_disk() and change all but NVMe
callers to pass false.

Put another way: when the 'node_id' arg was added to blk_alloc_queue()
a new blk_alloc_disk_numa_node() wasn't added (despite most block
drivers still only using NUMA_NO_NODE).  This new 'alloc_srcu' flag is
seen to be more niche, but it really should be no different on an
interface symmetry and design level.

> > I'm not sure why we'd prevent users from using dm-mpath on nvmeof.  I
> > think there's agreement that the nvme native multipath implementation is
> > the preferred way (that's the default in rhel9, even), but I don't think
> > that's a reason to nack this patch set.
> > 
> > Or have I missed your point entirely?
> 
> No you have not missed the point.  nvme-multipath exists longer than
> the nvme-tcp driver and is the only supported one for it upstream for
> a good reason.  If RedHat wants to do their own weirdo setups they can
> patch their kernels, but please leave the upstrem code alone.

Patch 3 can be left out if you'd like to force your world view on
everyone... you've already "won", _pretty please_ stop being so
punitive by blocking reasonable change.  We really can get along if
we're all willing to be intellectually honest.

To restate: Ming's patches 1 and 2 really are not "cruft".  They
expose control over request_queue allocation that should be accessible
by both blk_alloc_queue() and blk_alloc_disk().

Mike