On Mon, 2020-06-01 at 20:34 +0800, Coly Li wrote: > [...] > > > + > > > + /* convert back to LBA of the bcache device*/ > > > + zone->start -= data_offset; > > > + zone->wp -= data_offset; > > > > This has to be done depending on the zone type and zone condition: zone->wp is > > "invalid" for conventional zones, and sequential zones that are full, read-only > > or offline. So you need something like this: > > > > /* Remap LBA to the bcache device */ > > zone->start -= data_offset; > > switch(zone->cond) { > > case BLK_ZONE_COND_NOT_WP: > > case BLK_ZONE_COND_READONLY: > > case BLK_ZONE_COND_FULL: > > case BLK_ZONE_COND_OFFLINE: > > break; > > case BLK_ZONE_COND_EMPTY: > > zone->wp = zone->start; > > break; > > default: > > zone->wp -= data_offset; > > break; > > } > > > > return args->orig_cb(zone, idx, args->orig_data); > > > > Hmm, do we have a unified spec to handle the wp on different zone > condition ? This comes from SCSI ZBC and ATA ZAC specifications, version r05 of the documents for both (ratified specifications). These 2 specifications define identical semantic and states for SMR zones. Upcoming NVMe ZNS has similar and compatible zone management too. > In zonefs I see zone->wp sets to zone->start for, > - BLK_ZONE_COND_OFFLINE > - BLK_ZONE_COND_READONLY This is because zonefs uses a zone write pointer position for the file size: file size = zone->wp - zone->start; For the read-only and offline zone conditions, since the wp given by the drive is defined as "invalid", so unknown, zonefs changes the wp to zone start to make the file size 0 and prevent any read IO issuing. > > In sd_zbc.c, I see wp sets to end of the zone for > - BLK_ZONE_COND_FULL This is only to be nice to host software users, so that the wp position relative to the zone start is exactly the zone size, indicating that the entire zone was written. This allows to have equivalence between the tests zone->cond == BLK_ZONE_COND_FULL and zone->wp == zone->start + zone->len to make coding easier. But the fact remain that the value given for the wp of a full zone by the disk is undefined. Generally, drives return 0xfffff...ffff. > And in dm.c I see zone->wp is set to zone->start for, > - BLK_ZONE_COND_EMPTY That is because like bcache, dm may remap zones to indicate sector values that make sense for the target device. The values given by the physical underlying disk cannot be used as is. The code snippet above I sent does the same for the same reason. > It seems except for BLK_ZONE_COND_NOT_WP, for other conditions the > writer pointer should be taken cared by device driver already. The physical drive device driver gives you what the disk indicated, modulo the change for full zone. Bcache *is* a device driver too, driver of the target device. The report zones method for that device needs to take care of the zone remapping if needed. That is not for the underlying physical device driver lower in the stack to do this as that layer does not know how the drive is being used. > So I write such switch-case in the following way by the inspair of your > comments, > > /* Zone 0 should not be reported */ > if(WARN_ON_ONCE(zone->start < data_offset)) > return -EIO; That depends on the start sector specified when blkdev_report_zones() is called. Beware to have the start sector being >= zone size. That is, that report start sector needs remapping too. > /* convert back to LBA of the bcache device*/ > zone->start -= data_offset; > if (zone->cond != BLK_ZONE_COND_NOT_WP) > zone->wp -= data_offset; Since you have the BLK_ZONE_COND_NOT_WP condition in the switch-case, this is not needed. > > switch (zone->cond) { > case BLK_ZONE_COND_NOT_WP: > /* No write pointer available */ > break; > case BLK_ZONE_COND_EMPTY: zone->wp = zone->start; needs to ba added here so that your wp is remapped. > case BLK_ZONE_COND_READONLY: > case BLK_ZONE_COND_OFFLINE: > /* > * If underlying device driver does not properly > * set writer pointer, warn and fix it. > */ > if (WARN_ON_ONCE(zone->wp != zone->start)) > zone->wp = zone->start; This is not needed. The wp value is undefined for these conditions. touching it, looking at it or even thinking of it does not make sense :) So leave the wp as is for these 2 cases. > break; > case BLK_ZONE_COND_FULL: > /* > * If underlying device driver does not properly > * set writer pointer, warn and fix it. > */ > if (WARN_ON_ONCE(zone->wp != zone->start + zone->len)) > zone->wp = zone->start + zone->len; Simply unconditionally set the wp to zone->start + zone->len. No WARN ON needed at all. I actually forgot this case in the code snippet I sent. > break; > default: > break; > } > > return args->orig_cb(zone, idx, args->orig_data); > > The above code converts zone->wp by minus data_offset for > non-BLK_ZONE_COND_NOT_WP condition. And for other necessary conditions, > I just check whether the underlying device driver updates write pointer > properly (as I observed in other drivers), if not then drop a warning > and fix the write pointer to the expected value. The wp essentially comes from the drive. Except for a full zone, the value is passed on as is by the driver. The physical device driver does not "set" the wp. > Now the write pointer is only fixed when it was wrong value. If the > underlying device driver does not maintain the value properly, we figure > out and fix it. The wp will be a wrong value only if the drive you are using has a firmware bug. The "fixing" needed is because bcache device needs remapping (off-by-one-zone) of the physical device zone information. That is for bcache to do. And that remapping needs to be done carefully since some wp values are undefined. Your code above as is will for instance change the wp value from "undefined" (0xffffffffffffffff for drives out there, generally, but it could be anything) to "undefined - data_offset" for any readonly or offline zone. The result is still "undefined"... Doing that kind of operation on the wp is not necessary at all and does not serve any useful purpose. -- Damien Le Moal Western Digital Research