On 2020/06/26 15:58, Javier González wrote: > On 26.06.2020 06:42, Damien Le Moal wrote: >> On 2020/06/26 15:09, Javier González wrote: >>> On 26.06.2020 01:34, Damien Le Moal wrote: >>>> On 2020/06/25 21:22, Javier González wrote: >>>>> From: Javier González <javier.gonz@xxxxxxxxxxx> >>>>> >>>>> Add support for offline transition on the zoned block device using the >>>>> new zone management IOCTL >>>>> >>>>> Signed-off-by: Javier González <javier.gonz@xxxxxxxxxxx> >>>>> Signed-off-by: SelvaKumar S <selvakuma.s1@xxxxxxxxxxx> >>>>> Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx> >>>>> Signed-off-by: Nitesh Shetty <nj.shetty@xxxxxxxxxxx> >>>>> --- >>>>> block/blk-core.c | 2 ++ >>>>> block/blk-zoned.c | 3 +++ >>>>> drivers/nvme/host/core.c | 3 +++ >>>>> include/linux/blk_types.h | 3 +++ >>>>> include/linux/blkdev.h | 1 - >>>>> include/uapi/linux/blkzoned.h | 1 + >>>>> 6 files changed, 12 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>>> index 03252af8c82c..589cbdacc5ec 100644 >>>>> --- a/block/blk-core.c >>>>> +++ b/block/blk-core.c >>>>> @@ -140,6 +140,7 @@ static const char *const blk_op_name[] = { >>>>> REQ_OP_NAME(ZONE_CLOSE), >>>>> REQ_OP_NAME(ZONE_FINISH), >>>>> REQ_OP_NAME(ZONE_APPEND), >>>>> + REQ_OP_NAME(ZONE_OFFLINE), >>>>> REQ_OP_NAME(WRITE_SAME), >>>>> REQ_OP_NAME(WRITE_ZEROES), >>>>> REQ_OP_NAME(SCSI_IN), >>>>> @@ -1030,6 +1031,7 @@ generic_make_request_checks(struct bio *bio) >>>>> case REQ_OP_ZONE_OPEN: >>>>> case REQ_OP_ZONE_CLOSE: >>>>> case REQ_OP_ZONE_FINISH: >>>>> + case REQ_OP_ZONE_OFFLINE: >>>>> if (!blk_queue_is_zoned(q)) >>>>> goto not_supported; >>>>> break; >>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c >>>>> index 29194388a1bb..704fc15813d1 100644 >>>>> --- a/block/blk-zoned.c >>>>> +++ b/block/blk-zoned.c >>>>> @@ -416,6 +416,9 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, >>>>> case BLK_ZONE_MGMT_RESET: >>>>> op = REQ_OP_ZONE_RESET; >>>>> break; >>>>> + case BLK_ZONE_MGMT_OFFLINE: >>>>> + op = REQ_OP_ZONE_OFFLINE; >>>>> + break; >>>>> default: >>>>> return -ENOTTY; >>>>> } >>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >>>>> index f1215523792b..5b95c81d2a2d 100644 >>>>> --- a/drivers/nvme/host/core.c >>>>> +++ b/drivers/nvme/host/core.c >>>>> @@ -776,6 +776,9 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req, >>>>> case REQ_OP_ZONE_FINISH: >>>>> ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_FINISH); >>>>> break; >>>>> + case REQ_OP_ZONE_OFFLINE: >>>>> + ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_OFFLINE); >>>>> + break; >>>>> case REQ_OP_WRITE_ZEROES: >>>>> ret = nvme_setup_write_zeroes(ns, req, cmd); >>>>> break; >>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h >>>>> index 16b57fb2b99c..b3921263c3dd 100644 >>>>> --- a/include/linux/blk_types.h >>>>> +++ b/include/linux/blk_types.h >>>>> @@ -316,6 +316,8 @@ enum req_opf { >>>>> REQ_OP_ZONE_FINISH = 12, >>>>> /* write data at the current zone write pointer */ >>>>> REQ_OP_ZONE_APPEND = 13, >>>>> + /* Transition a zone to offline */ >>>>> + REQ_OP_ZONE_OFFLINE = 14, >>>>> >>>>> /* SCSI passthrough using struct scsi_request */ >>>>> REQ_OP_SCSI_IN = 32, >>>>> @@ -456,6 +458,7 @@ static inline bool op_is_zone_mgmt(enum req_opf op) >>>>> case REQ_OP_ZONE_OPEN: >>>>> case REQ_OP_ZONE_CLOSE: >>>>> case REQ_OP_ZONE_FINISH: >>>>> + case REQ_OP_ZONE_OFFLINE: >>>>> return true; >>>>> default: >>>>> return false; >>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >>>>> index bd8521f94dc4..8308d8a3720b 100644 >>>>> --- a/include/linux/blkdev.h >>>>> +++ b/include/linux/blkdev.h >>>>> @@ -372,7 +372,6 @@ extern int blkdev_zone_ops_ioctl(struct block_device *bdev, fmode_t mode, >>>>> unsigned int cmd, unsigned long arg); >>>>> extern int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, >>>>> unsigned int cmd, unsigned long arg); >>>>> - >>>>> #else /* CONFIG_BLK_DEV_ZONED */ >>>>> >>>>> static inline unsigned int blkdev_nr_zones(struct gendisk *disk) >>>>> diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h >>>>> index a8c89fe58f97..d0978ee10fc7 100644 >>>>> --- a/include/uapi/linux/blkzoned.h >>>>> +++ b/include/uapi/linux/blkzoned.h >>>>> @@ -155,6 +155,7 @@ enum blk_zone_action { >>>>> BLK_ZONE_MGMT_FINISH = 0x2, >>>>> BLK_ZONE_MGMT_OPEN = 0x3, >>>>> BLK_ZONE_MGMT_RESET = 0x4, >>>>> + BLK_ZONE_MGMT_OFFLINE = 0x5, >>>>> }; >>>>> >>>>> /** >>>>> >>>> >>>> As mentioned in previous email, the usefulness of this is dubious. Please >>>> elaborate in the commit message. Sure NVMe ZNS defines this and we can support >>>> it. But without a good use case, what is the point ? >>> >>> Use case is to transition zones in read-only state to offline when we >>> are done moving valid data. It is easier to explicitly managing zones >>> that are not usable by having all under the offline state. >> >> Then adding a simple BLKZONEOFFLINE ioctl, similar to open, close, finish and >> reset, would be enough. No need for all the new zone management ioctl with flags >> plumbing. > > Ok. We can add that then. > > Note that zone management is not motivated by this use case at all, but > it made sense to implement it here instead of as a new BLKZONEOFFLINE > IOCTL as ZAC/ZBC users will not be able to use it either way. Sure, that is fine. We could actually add that to sd_zbc.c since we have zone tracking there. A read-only zone can be reported as offline to sync-up with zns. The value of it is dubious though as most applications will treat read-only and offline zones the same way: as unusable. That is what zonefs does. > >> >>> >>>> >>>> scsi SD driver will return BLK_STS_NOTSUPP if this offlining is sent to a >>>> ZBC/ZAC drive. Not nice. Having a sysfs attribute "max_offline_zone_sectors" or >>>> the like to indicate support by the device or not would be nicer. >>> >>> We can do that. >>> >>>> >>>> Does offling ALL zones make any sense ? Because this patch does not prevent the >>>> use of the REQ_ZONE_ALL flags introduced in patch 2. Probably not a good idea to >>>> allow offlining all zones, no ? >>> >>> AFAIK the transition to offline is only valid when coming from a >>> read-only state. I did think of adding a check, but I can see that other >>> transitions go directly to the driver and then the device, so I decided >>> to follow the same model. If you think it is better, we can add the >>> check. >> >> My point was that the REQ_ZONE_ALL flag would make no sense for offlining zones >> but this patch does not have anything checking that. There is no point in >> sending a command that is known to be incorrect to the drive... > > I will add some extra checks then to fail early. I assume these should > be in the NVMe driver as it is NVMe-specific, right? If it is a simple BLKZONEOFFLINE ioctl, it can be processed exactly like open, close and finish, using blkdev_zone_mgmt(). Calling that one for a range of sectors of more than one zone will likely not make any sense most of the time, but that is allowed for all other ops, so I guess you can keep it as is for offline too. blkdev_zone_mgmt() will actually not need any change. You will only need to wire the ioctl path and update op_is_zone_mgmt(). That's it. Simple that way. > > Javier > -- Damien Le Moal Western Digital Research