Re: [PATCH 3/6] block: add support for zone offline transition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26.06.2020 07:17, Damien Le Moal wrote:
On 2020/06/26 15:58, Javier González wrote:
On 26.06.2020 06:42, Damien Le Moal wrote:
On 2020/06/26 15:09, Javier González wrote:
On 26.06.2020 01:34, Damien Le Moal wrote:
On 2020/06/25 21:22, Javier González wrote:
From: Javier González <javier.gonz@xxxxxxxxxxx>

Add support for offline transition on the zoned block device using the
new zone management IOCTL

Signed-off-by: Javier González <javier.gonz@xxxxxxxxxxx>
Signed-off-by: SelvaKumar S <selvakuma.s1@xxxxxxxxxxx>
Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
Signed-off-by: Nitesh Shetty <nj.shetty@xxxxxxxxxxx>
---
 block/blk-core.c              | 2 ++
 block/blk-zoned.c             | 3 +++
 drivers/nvme/host/core.c      | 3 +++
 include/linux/blk_types.h     | 3 +++
 include/linux/blkdev.h        | 1 -
 include/uapi/linux/blkzoned.h | 1 +
 6 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 03252af8c82c..589cbdacc5ec 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -140,6 +140,7 @@ static const char *const blk_op_name[] = {
 	REQ_OP_NAME(ZONE_CLOSE),
 	REQ_OP_NAME(ZONE_FINISH),
 	REQ_OP_NAME(ZONE_APPEND),
+	REQ_OP_NAME(ZONE_OFFLINE),
 	REQ_OP_NAME(WRITE_SAME),
 	REQ_OP_NAME(WRITE_ZEROES),
 	REQ_OP_NAME(SCSI_IN),
@@ -1030,6 +1031,7 @@ generic_make_request_checks(struct bio *bio)
 	case REQ_OP_ZONE_OPEN:
 	case REQ_OP_ZONE_CLOSE:
 	case REQ_OP_ZONE_FINISH:
+	case REQ_OP_ZONE_OFFLINE:
 		if (!blk_queue_is_zoned(q))
 			goto not_supported;
 		break;
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 29194388a1bb..704fc15813d1 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -416,6 +416,9 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
 	case BLK_ZONE_MGMT_RESET:
 		op = REQ_OP_ZONE_RESET;
 		break;
+	case BLK_ZONE_MGMT_OFFLINE:
+		op = REQ_OP_ZONE_OFFLINE;
+		break;
 	default:
 		return -ENOTTY;
 	}
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f1215523792b..5b95c81d2a2d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -776,6 +776,9 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
 	case REQ_OP_ZONE_FINISH:
 		ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_FINISH);
 		break;
+	case REQ_OP_ZONE_OFFLINE:
+		ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_OFFLINE);
+		break;
 	case REQ_OP_WRITE_ZEROES:
 		ret = nvme_setup_write_zeroes(ns, req, cmd);
 		break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 16b57fb2b99c..b3921263c3dd 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -316,6 +316,8 @@ enum req_opf {
 	REQ_OP_ZONE_FINISH	= 12,
 	/* write data at the current zone write pointer */
 	REQ_OP_ZONE_APPEND	= 13,
+	/* Transition a zone to offline */
+	REQ_OP_ZONE_OFFLINE	= 14,

 	/* SCSI passthrough using struct scsi_request */
 	REQ_OP_SCSI_IN		= 32,
@@ -456,6 +458,7 @@ static inline bool op_is_zone_mgmt(enum req_opf op)
 	case REQ_OP_ZONE_OPEN:
 	case REQ_OP_ZONE_CLOSE:
 	case REQ_OP_ZONE_FINISH:
+	case REQ_OP_ZONE_OFFLINE:
 		return true;
 	default:
 		return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bd8521f94dc4..8308d8a3720b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -372,7 +372,6 @@ extern int blkdev_zone_ops_ioctl(struct block_device *bdev, fmode_t mode,
 				  unsigned int cmd, unsigned long arg);
 extern int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
 				  unsigned int cmd, unsigned long arg);
-
 #else /* CONFIG_BLK_DEV_ZONED */

 static inline unsigned int blkdev_nr_zones(struct gendisk *disk)
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
index a8c89fe58f97..d0978ee10fc7 100644
--- a/include/uapi/linux/blkzoned.h
+++ b/include/uapi/linux/blkzoned.h
@@ -155,6 +155,7 @@ enum blk_zone_action {
 	BLK_ZONE_MGMT_FINISH	= 0x2,
 	BLK_ZONE_MGMT_OPEN	= 0x3,
 	BLK_ZONE_MGMT_RESET	= 0x4,
+	BLK_ZONE_MGMT_OFFLINE	= 0x5,
 };

 /**


As mentioned in previous email, the usefulness of this is dubious. Please
elaborate in the commit message. Sure NVMe ZNS defines this and we can support
it. But without a good use case, what is the point ?

Use case is to transition zones in read-only state to offline when we
are done moving valid data. It is easier to explicitly managing zones
that are not usable by having all under the offline state.

Then adding a simple BLKZONEOFFLINE ioctl, similar to open, close, finish and
reset, would be enough. No need for all the new zone management ioctl with flags
plumbing.

Ok. We can add that then.

Note that zone management is not motivated by this use case at all, but
it made sense to implement it here instead of as a new BLKZONEOFFLINE
IOCTL as ZAC/ZBC users will not be able to use it either way.

Sure, that is fine. We could actually add that to sd_zbc.c since we have zone
tracking there. A read-only zone can be reported as offline to sync-up with zns.
The value of it is dubious though as most applications will treat read-only and
offline zones the same way: as unusable. That is what zonefs does.

Ok.






scsi SD driver will return BLK_STS_NOTSUPP if this offlining is sent to a
ZBC/ZAC drive. Not nice. Having a sysfs attribute "max_offline_zone_sectors" or
the like to indicate support by the device or not would be nicer.

We can do that.


Does offling ALL zones make any sense ? Because this patch does not prevent the
use of the REQ_ZONE_ALL flags introduced in patch 2. Probably not a good idea to
allow offlining all zones, no ?

AFAIK the transition to offline is only valid when coming from a
read-only state. I did think of adding a check, but I can see that other
transitions go directly to the driver and then the device, so I decided
to follow the same model. If you think it is better, we can add the
check.

My point was that the REQ_ZONE_ALL flag would make no sense for offlining zones
but this patch does not have anything checking that. There is no point in
sending a command that is known to be incorrect to the drive...

I will add some extra checks then to fail early. I assume these should
be in the NVMe driver as it is NVMe-specific, right?

If it is a simple BLKZONEOFFLINE ioctl, it can be processed exactly like open,
close and finish, using blkdev_zone_mgmt(). Calling that one for a range of
sectors of more than one zone will likely not make any sense most of the time,
but that is allowed for all other ops, so I guess you can keep it as is for
offline too. blkdev_zone_mgmt() will actually not need any change. You will only
need to wire the ioctl path and update op_is_zone_mgmt(). That's it. Simple that
way.

Sounds good.

Javier



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux