On 2020/03/11 1:46, Christoph Hellwig wrote: > On Tue, Mar 10, 2020 at 06:46:51PM +0900, Johannes Thumshirn wrote: >> From: Damien Le Moal <damien.lemoal@xxxxxxx> >> >> Not all zoned block devices natively support the zone append command. >> E.g. SCSI and ATA disks do not define this command. However, it is >> fairly straightforward to emulate this command at the LLD level using >> regular write commands if a zone write pointer position is known. >> Introducing such emulation enables the use of zone append write for all >> device types, therefore simplifying for instance the implementation of >> file systems zoned block device support by avoiding the need for >> different write pathes depending on the device capabilities. > > I'd much rather have this in the driver itself than in the block layer. > Especially as sd will hopefully remain the only users. Yes, I agree with you here. That would be nicer, but early attempt to do so failed as we always ended up with potential races on number of zones/wp array size in the case of a device change/revalidation. Moving the wp array allocation and initialization to blk_revalidate_disk_zones() greatly simplifies the code and removes the races as all updates to zone bitmaps, wp array and nr zones are done under a queue freeze all together. Moving the wp array only to sd_zbc, even using a queue freeze, leads to potential out-of-bounds accesses for the wp array. Another undesirable side effect of moving the wp array initialization to sd_zbc is that we would need another full drive zone report after blk_revalidate_disk_zones() own full report. That is costly. On 20TB SMR disks with more than 75000 zones, the added delay is significant. Doing all initialization within blk_revalidate_disk_zones() full zone report loop avoids that added overhead. -- Damien Le Moal Western Digital Research