On Wed, Jun 15, 2022 at 04:02:36PM -0400, Mike Snitzer wrote: > On Wed, Jun 15 2022 at 1:50P -0400, > Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > > On 6/15/22 08:29, Mike Snitzer wrote: > > > On Wed, Jun 15 2022 at 10:36P -0400, > > > Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > > > > > > On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote: > > > > > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote: > > > > > > On Fri, Jun 10 2022 at 1:15P -0400, > > > > > > Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote: > > > > > > > > I believe this commit introduced a regression in dm verity on systems > > > > > > > > where data device is an NVME one. Loading table fails with the > > > > > > > > following diagnostics: > > > > > > > > > > > > > > > > device-mapper: table: table load rejected: including non-request-stackable devices > > > > > > > > > > > > > > > > The same kernel works with the same data drive on the SCSI interface. > > > > > > > > NVME-backed dm verity works with just this commit reverted. > > > > > > > > > > > > > > > > I believe the presence of the immutable partition is used as an indicator > > > > > > > > of special case NVME configuration and if the data device's name starts > > > > > > > > with "nvme" the code tries to switch the target type to > > > > > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010). > > > > > > > > > > > > > > > > The special NVME optimization case was removed in > > > > > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is > > > > > > > > affected. > > > > > > > > > > > > > > > > > > > > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here? Should I also > > > > > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based > > > > > > > immutable singleton target on NVMe") to those older kernels? If so, > > > > > > > have you tested this and verified that it worked? > > > > > > > > > > > > Sorry for the unforeseen stable@ troubles here! > > > > > > > > > > > > In general we'd be fine to apply commit 9c37de297f65 but to do it > > > > > > properly would require also making sure commits that remove > > > > > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove > > > > > > unnecessary NVMe branching in favor of scsi_dh checks") are applied -- > > > > > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to > > > > > > be removed. > > > > > > > > > > > > The commit header for 8d47e65948dd documents what > > > > > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and > > > > > > "nvme" mode really never got used by any userspace that I'm aware of. > > > > > > > > > > > > Sadly I currently don't have the time to do this backport for all N > > > > > > stable kernels... :( > > > > > > > > > > > > But if that backport gets out of control: A simpler, albeit stable@ > > > > > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make > > > > > > > > 9c37de297f65 can not be reverted in 5.4 and older because it isn't there, > > > > and trying to apply it results in conflicts which at least I can not > > > > resolve. > > > > > > > > > > it so that DM-mpath and DM core just used bio-based if "nvme" is > > > > > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.: > > > > > > > > > > > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) > > > > > > > > > > > > if (!strcasecmp(queue_mode_name, "bio")) > > > > > > m->queue_mode = DM_TYPE_BIO_BASED; > > > > > > else if (!strcasecmp(queue_mode_name, "nvme")) > > > > > > - m->queue_mode = DM_TYPE_NVME_BIO_BASED; > > > > > > + m->queue_mode = DM_TYPE_BIO_BASED; > > > > > > else if (!strcasecmp(queue_mode_name, "rq")) > > > > > > m->queue_mode = DM_TYPE_REQUEST_BASED; > > > > > > else if (!strcasecmp(queue_mode_name, "mq")) > > > > > > > > > > > > Mike > > > > > > > > > > > > > > > > Ok, please submit a working patch for the kernels that need it so that > > > > > we can review and apply it to solve this regression. > > > > > > > > > > > > > So, effectively, v5.4.y and older are broken right now for use cases > > > > with dm on NVME drives. > > > > > > > > Given that the regression does affect older branches, and given that we > > > > have to revert this patch to avoid regressions in ChromeOS, would it be > > > > possible to revert it from v5.4.y and older until a fix is found ? > > > > > > I obviously would prefer to not have this false-start. > > > > > The false start has already happened since we had to revert the patch > > from chromeos-5.4 and older branches. > > OK, well this is pretty easy to fix in general. If there are slight > differences across older trees they are easily resolved. Fact that > stable@ couldn't cope with backporting 9c37de297f65 is.. what it is. > > But this will fix the issue on 5.4.y: > > From: Mike Snitzer <snitzer@xxxxxxxxxx> > Date: Wed, 15 Jun 2022 14:07:09 -0400 > Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe > > Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream. > > There is no benefit to DM special-casing NVMe. Remove all code used to > establish DM_TYPE_NVME_BIO_BASED. > > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> > --- > drivers/md/dm-table.c | 32 ++---------------- > drivers/md/dm.c | 64 +++-------------------------------- > include/linux/device-mapper.h | 1 - > 3 files changed, 7 insertions(+), 90 deletions(-) > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index 06b382304d92..81bc36a43b32 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args); > static bool __table_type_bio_based(enum dm_queue_mode table_type) > { > return (table_type == DM_TYPE_BIO_BASED || > - table_type == DM_TYPE_DAX_BIO_BASED || > - table_type == DM_TYPE_NVME_BIO_BASED); > + table_type == DM_TYPE_DAX_BIO_BASED); > } > > static bool __table_type_request_based(enum dm_queue_mode table_type) > @@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t, > return true; > } > > -static bool dm_table_does_not_support_partial_completion(struct dm_table *t); > - > static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev, > sector_t start, sector_t len, void *data) > { > @@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t) > goto verify_bio_based; > } > BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED); > - BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED); > goto verify_rq_based; > } > > @@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t) > if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) || > (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) { > t->type = DM_TYPE_DAX_BIO_BASED; > - } else { > - /* Check if upgrading to NVMe bio-based is valid or required */ > - tgt = dm_table_get_immutable_target(t); > - if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) { > - t->type = DM_TYPE_NVME_BIO_BASED; > - goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */ > - } else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) { > - t->type = DM_TYPE_NVME_BIO_BASED; > - } > } > return 0; > } > @@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t) > * (e.g. request completion process for partial completion.) > */ > if (t->num_targets > 1) { > - DMERR("%s DM doesn't support multiple targets", > - t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based"); > + DMERR("request-based DM doesn't support multiple targets"); > return -EINVAL; > } > > @@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev, > return q && !blk_queue_add_random(q); > } > > -static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev, > - sector_t start, sector_t len, void *data) > -{ > - char b[BDEVNAME_SIZE]; > - > - /* For now, NVMe devices are the only devices of this class */ > - return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0); > -} > - > -static bool dm_table_does_not_support_partial_completion(struct dm_table *t) > -{ > - return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL); > -} > - > static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev, > sector_t start, sector_t len, void *data) > { > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 37b8bb4d80f0..3c45c389ded9 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio) > struct mapped_device *md = tio->io->md; > dm_endio_fn endio = tio->ti->type->end_io; > > - if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) { > + if (unlikely(error == BLK_STS_TARGET)) { > if (bio_op(bio) == REQ_OP_DISCARD && > !bio->bi_disk->queue->limits.max_discard_sectors) > disable_discard(md); > @@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio) > /* the bio has been remapped so dispatch it */ > trace_block_bio_remap(clone->bi_disk->queue, clone, > bio_dev(io->orig_bio), sector); > - if (md->type == DM_TYPE_NVME_BIO_BASED) > - ret = direct_make_request(clone); > - else > - ret = generic_make_request(clone); drivers/md/dm.c:1340:24: error: unused variable 'md' I'll try again with this fixed. Guenter > + ret = generic_make_request(clone); > break; > case DM_MAPIO_KILL: > if (unlikely(swap_bios_limit(ti, clone))) { > @@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, > return ret; > } > > -/* > - * Optimized variant of __split_and_process_bio that leverages the > - * fact that targets that use it do _not_ have a need to split bios. > - */ > -static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map, > - struct bio *bio, struct dm_target *ti) > -{ > - struct clone_info ci; > - blk_qc_t ret = BLK_QC_T_NONE; > - int error = 0; > - > - init_clone_info(&ci, md, map, bio); > - > - if (bio->bi_opf & REQ_PREFLUSH) { > - struct bio flush_bio; > - > - /* > - * Use an on-stack bio for this, it's safe since we don't > - * need to reference it after submit. It's just used as > - * the basis for the clone(s). > - */ > - bio_init(&flush_bio, NULL, 0); > - flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; > - ci.bio = &flush_bio; > - ci.sector_count = 0; > - error = __send_empty_flush(&ci); > - bio_uninit(ci.bio); > - /* dec_pending submits any data associated with flush */ > - } else { > - struct dm_target_io *tio; > - > - ci.bio = bio; > - ci.sector_count = bio_sectors(bio); > - if (__process_abnormal_io(&ci, ti, &error)) > - goto out; > - > - tio = alloc_tio(&ci, ti, 0, GFP_NOIO); > - ret = __clone_and_map_simple_bio(&ci, tio, NULL); > - } > -out: > - /* drop the extra reference count */ > - dec_pending(ci.io, errno_to_blk_status(error)); > - return ret; > -} > - > static blk_qc_t dm_process_bio(struct mapped_device *md, > struct dm_table *map, struct bio *bio) > { > @@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md, > /* regular IO is split by __split_and_process_bio */ > } > > - if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED) > - return __process_bio(md, map, bio, ti); > return __split_and_process_bio(md, map, bio); > } > > @@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t, > if (request_based) > dm_stop_queue(q); > > - if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) { > + if (request_based) { > /* > - * Leverage the fact that request-based DM targets and > - * NVMe bio based targets are immutable singletons > - * - used to optimize both dm_request_fn and dm_mq_queue_rq; > - * and __process_bio. > + * Leverage the fact that request-based DM targets are > + * immutable singletons - used to optimize dm_mq_queue_rq. > */ > md->immutable_target = dm_table_get_immutable_target(t); > } > @@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t) > break; > case DM_TYPE_BIO_BASED: > case DM_TYPE_DAX_BIO_BASED: > - case DM_TYPE_NVME_BIO_BASED: > dm_init_congested_fn(md); > break; > case DM_TYPE_NONE: > @@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu > switch (type) { > case DM_TYPE_BIO_BASED: > case DM_TYPE_DAX_BIO_BASED: > - case DM_TYPE_NVME_BIO_BASED: > pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size); > front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone); > io_front_pad = roundup(front_pad, __alignof__(struct dm_io)) + offsetof(struct dm_io, tio); > diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h > index a53d7d2c2d95..60631f3abddb 100644 > --- a/include/linux/device-mapper.h > +++ b/include/linux/device-mapper.h > @@ -28,7 +28,6 @@ enum dm_queue_mode { > DM_TYPE_BIO_BASED = 1, > DM_TYPE_REQUEST_BASED = 2, > DM_TYPE_DAX_BIO_BASED = 3, > - DM_TYPE_NVME_BIO_BASED = 4, > }; > > typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t; > -- > 2.30.0 > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel