On 11/18/22 13:12, Ming Lei wrote: [...] >>> You can only assign it to zoned write request, but you still have to check >>> the sequence inside each zone, right? Then why not just check LBAs in >>> each zone simply? >> >> We would need to know the zone map, which is not otherwise required. >> Then we would need to track the write pointer for each open zone for >> each queue, so that we can stall writes that are not issued at the write >> pointer. This is in effect all zones, because we cannot track when zones >> are implicitly closed. Then, if different queues are issuing writes to > > Can you explain "implicitly closed" state a bit? > > From https://zonedstorage.io/docs/introduction/zoned-storage, only the > following words are mentioned about closed state: > > ```Conversely, implicitly or explicitly opened zoned can be transitioned to the > closed state using the CLOSE ZONE command.``` When a write is issued to an empty or closed zone, the drive will automatically transition the zone into the implicit open state. This is called implicit open because the host did not (explicitly) issue an open zone command. When there are too many implicitly open zones, the drive may choose to close one of the implicitly opened zone to implicitly open the zone that is a target for a write command. Simple in a nutshell. This is done so that the drive can work with a limited set of resources needed to handle open zones, that is, zones that are being written. There are some more nasty details to all this with limits on the number of open zones and active zones that a zoned drive may have. > > zone info can be cached in the mapping(hash table)(zone sector is the key, and zone > info is the value), which can be implemented as one LRU style. If any zone > info isn't hit in the mapping table, ioctl(BLKREPORTZONE) can be called for > obtaining the zone info. > >> the same zone, we need to sync across queues. Userspace may have >> synchronization in place to issue writes with multiple threads while >> still hitting the write pointer. > > You can trust mq-dealine, which guaranteed that write IO is sent to ->queue_rq() > in order, no matter MQ or SQ. > > Yes, it could be issue from multiple queues for ublksrv, which doesn't sync > among multiple queues. > > But per-zone re-order still can solve the issue, just need one lock > for each zone to cover the MQ re-order. That lock is already there and using it, mq-deadline will never dispatch more than one write per zone at any time. This is to avoid write reordering. So multi queue or not, for any zone, there is no possibility of having writes reordered. -- Damien Le Moal Western Digital Research