Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/15/22 13:02, Mike Snitzer wrote:
On Wed, Jun 15 2022 at  1:50P -0400,
Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

On 6/15/22 08:29, Mike Snitzer wrote:
On Wed, Jun 15 2022 at 10:36P -0400,
Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
On Fri, Jun 10 2022 at  1:15P -0400,
Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
I believe this commit introduced a regression in dm verity on systems
where data device is an NVME one. Loading table fails with the
following diagnostics:

device-mapper: table: table load rejected: including non-request-stackable devices

The same kernel works with the same data drive on the SCSI interface.
NVME-backed dm verity works with just this commit reverted.

I believe the presence of the immutable partition is used as an indicator
of special case NVME configuration and if the data device's name starts
with "nvme" the code tries to switch the target type to
DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).

The special NVME optimization case was removed in
5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
affected.


Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
immutable singleton target on NVMe") to those older kernels?  If so,
have you tested this and verified that it worked?

Sorry for the unforeseen stable@ troubles here!

In general we'd be fine to apply commit 9c37de297f65 but to do it
properly would require also making sure commits that remove
"DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
unnecessary NVMe branching in favor of scsi_dh checks") are applied --
basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
be removed.

The commit header for 8d47e65948dd documents what
DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
"nvme" mode really never got used by any userspace that I'm aware of.

Sadly I currently don't have the time to do this backport for all N
stable kernels... :(

But if that backport gets out of control: A simpler, albeit stable@
unicorn, way to resolve this is to simply revert 9c37de297f65 and make

9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
and trying to apply it results in conflicts which at least I can not
resolve.

it so that DM-mpath and DM core just used bio-based if "nvme" is
requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:

@@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)

                          if (!strcasecmp(queue_mode_name, "bio"))
                                  m->queue_mode = DM_TYPE_BIO_BASED;
			else if (!strcasecmp(queue_mode_name, "nvme"))
-                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
+                               m->queue_mode = DM_TYPE_BIO_BASED;
                          else if (!strcasecmp(queue_mode_name, "rq"))
                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
                          else if (!strcasecmp(queue_mode_name, "mq"))

Mike


Ok, please submit a working patch for the kernels that need it so that
we can review and apply it to solve this regression.


So, effectively, v5.4.y and older are broken right now for use cases
with dm on NVME drives.

Given that the regression does affect older branches, and given that we
have to revert this patch to avoid regressions in ChromeOS, would it be
possible to revert it from v5.4.y and older until a fix is found ?

I obviously would prefer to not have this false-start.

The false start has already happened since we had to revert the patch
from chromeos-5.4 and older branches.

OK, well this is pretty easy to fix in general.  If there are slight
differences across older trees they are easily resolved.  Fact that
stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.

But this will fix the issue on 5.4.y:

From: Mike Snitzer <snitzer@xxxxxxxxxx>
Date: Wed, 15 Jun 2022 14:07:09 -0400
Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe

Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.

There is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.

Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>

This patch passes our tests after I removed the unused variable.

Tested-by: Guenter Roeck <linux@xxxxxxxxxxxx>

Thanks a lot for the backport!

Guenter

---
  drivers/md/dm-table.c         | 32 ++----------------
  drivers/md/dm.c               | 64 +++--------------------------------
  include/linux/device-mapper.h |  1 -
  3 files changed, 7 insertions(+), 90 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 06b382304d92..81bc36a43b32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
  static bool __table_type_bio_based(enum dm_queue_mode table_type)
  {
  	return (table_type == DM_TYPE_BIO_BASED ||
-		table_type == DM_TYPE_DAX_BIO_BASED ||
-		table_type == DM_TYPE_NVME_BIO_BASED);
+		table_type == DM_TYPE_DAX_BIO_BASED);
  }
static bool __table_type_request_based(enum dm_queue_mode table_type)
@@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
  	return true;
  }
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
-
  static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
  				  sector_t start, sector_t len, void *data)
  {
@@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
  			goto verify_bio_based;
  		}
  		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
-		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
  		goto verify_rq_based;
  	}
@@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
  		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
  		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
  			t->type = DM_TYPE_DAX_BIO_BASED;
-		} else {
-			/* Check if upgrading to NVMe bio-based is valid or required */
-			tgt = dm_table_get_immutable_target(t);
-			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
-			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-			}
  		}
  		return 0;
  	}
@@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
  	 * (e.g. request completion process for partial completion.)
  	 */
  	if (t->num_targets > 1) {
-		DMERR("%s DM doesn't support multiple targets",
-		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
+		DMERR("request-based DM doesn't support multiple targets");
  		return -EINVAL;
  	}
@@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
  	return q && !blk_queue_add_random(q);
  }
-static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
-					sector_t start, sector_t len, void *data)
-{
-	char b[BDEVNAME_SIZE];
-
-	/* For now, NVMe devices are the only devices of this class */
-	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
-}
-
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
-{
-	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
-}
-
  static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
  					 sector_t start, sector_t len, void *data)
  {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 37b8bb4d80f0..3c45c389ded9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
  	struct mapped_device *md = tio->io->md;
  	dm_endio_fn endio = tio->ti->type->end_io;
- if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
+	if (unlikely(error == BLK_STS_TARGET)) {
  		if (bio_op(bio) == REQ_OP_DISCARD &&
  		    !bio->bi_disk->queue->limits.max_discard_sectors)
  			disable_discard(md);
@@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
  		/* the bio has been remapped so dispatch it */
  		trace_block_bio_remap(clone->bi_disk->queue, clone,
  				      bio_dev(io->orig_bio), sector);
-		if (md->type == DM_TYPE_NVME_BIO_BASED)
-			ret = direct_make_request(clone);
-		else
-			ret = generic_make_request(clone);
+		ret = generic_make_request(clone);
  		break;
  	case DM_MAPIO_KILL:
  		if (unlikely(swap_bios_limit(ti, clone))) {
@@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
  	return ret;
  }
-/*
- * Optimized variant of __split_and_process_bio that leverages the
- * fact that targets that use it do _not_ have a need to split bios.
- */
-static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
-			      struct bio *bio, struct dm_target *ti)
-{
-	struct clone_info ci;
-	blk_qc_t ret = BLK_QC_T_NONE;
-	int error = 0;
-
-	init_clone_info(&ci, md, map, bio);
-
-	if (bio->bi_opf & REQ_PREFLUSH) {
-		struct bio flush_bio;
-
-		/*
-		 * Use an on-stack bio for this, it's safe since we don't
-		 * need to reference it after submit. It's just used as
-		 * the basis for the clone(s).
-		 */
-		bio_init(&flush_bio, NULL, 0);
-		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
-		ci.bio = &flush_bio;
-		ci.sector_count = 0;
-		error = __send_empty_flush(&ci);
-		bio_uninit(ci.bio);
-		/* dec_pending submits any data associated with flush */
-	} else {
-		struct dm_target_io *tio;
-
-		ci.bio = bio;
-		ci.sector_count = bio_sectors(bio);
-		if (__process_abnormal_io(&ci, ti, &error))
-			goto out;
-
-		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
-		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
-	}
-out:
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
-	return ret;
-}
-
  static blk_qc_t dm_process_bio(struct mapped_device *md,
  			       struct dm_table *map, struct bio *bio)
  {
@@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
  		/* regular IO is split by __split_and_process_bio */
  	}
- if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
-		return __process_bio(md, map, bio, ti);
  	return __split_and_process_bio(md, map, bio);
  }
@@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
  	if (request_based)
  		dm_stop_queue(q);
- if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
+	if (request_based) {
  		/*
-		 * Leverage the fact that request-based DM targets and
-		 * NVMe bio based targets are immutable singletons
-		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
-		 *   and __process_bio.
+		 * Leverage the fact that request-based DM targets are
+		 * immutable singletons - used to optimize dm_mq_queue_rq.
  		 */
  		md->immutable_target = dm_table_get_immutable_target(t);
  	}
@@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
  		break;
  	case DM_TYPE_BIO_BASED:
  	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
  		dm_init_congested_fn(md);
  		break;
  	case DM_TYPE_NONE:
@@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
  	switch (type) {
  	case DM_TYPE_BIO_BASED:
  	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
  		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
  		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
  		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a53d7d2c2d95..60631f3abddb 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -28,7 +28,6 @@ enum dm_queue_mode {
  	DM_TYPE_BIO_BASED	 = 1,
  	DM_TYPE_REQUEST_BASED	 = 2,
  	DM_TYPE_DAX_BIO_BASED	 = 3,
-	DM_TYPE_NVME_BIO_BASED	 = 4,
  };
typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux