On 2020/06/26 16:29, Javier González wrote: > On 26.06.2020 07:09, Damien Le Moal wrote: >> On 2020/06/26 15:55, Javier González wrote: >>> On 26.06.2020 06:49, Damien Le Moal wrote: >>>> On 2020/06/26 15:13, Javier González wrote: >>>>> On 26.06.2020 00:04, Damien Le Moal wrote: >>>>>> On 2020/06/26 6:49, Keith Busch wrote: >>>>>>> On Thu, Jun 25, 2020 at 02:21:52PM +0200, Javier González wrote: >>>>>>>> drivers/nvme/host/zns.c | 7 +++++++ >>>>>>>> 1 file changed, 7 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c >>>>>>>> index 7d8381fe7665..de806788a184 100644 >>>>>>>> --- a/drivers/nvme/host/zns.c >>>>>>>> +++ b/drivers/nvme/host/zns.c >>>>>>>> @@ -234,6 +234,13 @@ static int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector, >>>>>>>> sector += ns->zsze * nz; >>>>>>>> } >>>>>>>> >>>>>>>> + if (nr_zones < 0 && zone_idx != ns->nr_zones) { >>>>>>>> + dev_err(ns->ctrl->device, "inconsistent zone count %u/%u\n", >>>>>>>> + zone_idx, ns->nr_zones); >>>>>>>> + ret = -EINVAL; >>>>>>>> + goto out_free; >>>>>>>> + } >>>>>>>> + >>>>>>>> ret = zone_idx; >>>>>>> >>>>>>> nr_zones is unsigned, so it's never < 0. >>>>>>> >>>>>>> The API we're providing doesn't require zone_idx equal the namespace's >>>>>>> nr_zones at the end, though. A subset of the total number of zones can >>>>>>> be requested here. >>>>>>> >>>>> >>>>> I did see nr_zones coming with -1; guess it is my compiler. >>>> >>>> See include/linux/blkdev.h. -1 is: >>>> >>>> #define BLK_ALL_ZONES ((unsigned int)-1) >>>> >>>> Which is documented in block/blk-zoned.c: >>>> >>>> /** >>>> * blkdev_report_zones - Get zones information >>>> * @bdev: Target block device >>>> * @sector: Sector from which to report zones >>>> * @nr_zones: Maximum number of zones to report >>>> * @cb: Callback function called for each reported zone >>>> * @data: Private data for the callback >>>> * >>>> * Description: >>>> * Get zone information starting from the zone containing @sector for at most >>>> * @nr_zones, and call @cb for each zone reported by the device. >>>> * To report all zones in a device starting from @sector, the BLK_ALL_ZONES >>>> * constant can be passed to @nr_zones. >>>> * Returns the number of zones reported by the device, or a negative errno >>>> * value in case of failure. >>>> * >>>> * Note: The caller must use memalloc_noXX_save/restore() calls to control >>>> * memory allocations done within this function. >>>> */ >>>> int blkdev_report_zones(struct block_device *bdev, sector_t sector, >>>> unsigned int nr_zones, report_zones_cb cb, void *data) >>>> >>>>> >>>>>> >>>>>> Yes, absolutely. zone_idx is not an absolute zone number. It is the index of the >>>>>> reported zone descriptor in the current report range requested by the user, >>>>>> which is not necessarily for the entire drive (i.e., provided nr zones is less >>>>>> than the total number of zones of the disk and/or start sector is > 0). So >>>>>> zone_idx indicates the actual number of zones reported, it is not the total >>>>> >>>>> I see. As I can see, when nr_zones comes undefined I believed we could >>>>> assume that zone_idx is absolute, but I can be wrong. >>>> >>>> No. zone_idx is *always* the index of the zone in the current report. Whatever >>>> that report is, regardless of the report starting point and number of zones >>>> requested. E.g. For a single zone report (nr_zones = 1), you will always see >>>> zone_idx = 0. For a full report, zone_idx will correspond to the zone number. >>>> This is used for example in blk_revalidate_disk_zones() to initialize the zone >>>> bitmaps. >>>> >>>>> Does it make sense to support this check with an additional counter and >>>>> a explicit nr_zones initialization when undefined or you >>>>> prefer to just remove it as Matias suggested? >>>> >>>> The check is not needed at all. >>>> >>>> If the device is buggy and reports more zones than the device capacity or any >>>> other bugs, the driver can catch that when it processes the report. >>>> blk_revalidate_disk_zones() also has many checks. >>> >>> I have managed to create a QEMU ZNS device that gave me a headache with >>> a little bit of extra capacity that triggered an additional zone report. >>> This was the motivation for the patch. >> >> The device emulation sound buggy... If the capacity is wrong, then the report >> will be too since zones are all supposed to be sequential (no holes between >> zones) and up to the disk capacity only (last zone start + len = capacity + 1) >> >> If one or the other is wrong, this should be easy to detect. Normally, >> blk_revalidate_disk_zones() should be able to catch that. > > We have the capability to select the reported device capacity manually > for a number of reasons. One of the different test configurations in our > CI did go through. If you change the drive capacity on the fly (e.g. with a low level format ?), you must revalidate the disk/drive to get the changed capacity. A lot of things will break otherwise This is not just report zones that will be incorrect. > > But it is OK, I will remove the check on V2. > > Javier > -- Damien Le Moal Western Digital Research