On Fri, Nov 18, 2022 at 10:41:31AM +0100, Andreas Hindborg wrote: > > Ming Lei <ming.lei@xxxxxxxxxx> writes: > > > CAUTION: This email originated from outside of Western Digital. Do not click on > > links or open attachments unless you recognize the sender and know that the > > content is safe. > > > > > > On Fri, Nov 18, 2022 at 01:35:29PM +0900, Damien Le Moal wrote: > >> On 11/18/22 13:12, Ming Lei wrote: > >> [...] > >> >>> You can only assign it to zoned write request, but you still have to check > >> >>> the sequence inside each zone, right? Then why not just check LBAs in > >> >>> each zone simply? > >> >> > >> >> We would need to know the zone map, which is not otherwise required. > >> >> Then we would need to track the write pointer for each open zone for > >> >> each queue, so that we can stall writes that are not issued at the write > >> >> pointer. This is in effect all zones, because we cannot track when zones > >> >> are implicitly closed. Then, if different queues are issuing writes to > >> > > >> > Can you explain "implicitly closed" state a bit? > >> > > >> > From https://zonedstorage.io/docs/introduction/zoned-storage, only the > >> > following words are mentioned about closed state: > >> > > >> > ```Conversely, implicitly or explicitly opened zoned can be transitioned to the > >> > closed state using the CLOSE ZONE command.``` > >> > >> When a write is issued to an empty or closed zone, the drive will > >> automatically transition the zone into the implicit open state. This is > >> called implicit open because the host did not (explicitly) issue an open > >> zone command. > >> > >> When there are too many implicitly open zones, the drive may choose to > >> close one of the implicitly opened zone to implicitly open the zone that > >> is a target for a write command. > >> > >> Simple in a nutshell. This is done so that the drive can work with a > >> limited set of resources needed to handle open zones, that is, zones that > >> are being written. There are some more nasty details to all this with > >> limits on the number of open zones and active zones that a zoned drive may > >> have. > > > > OK, thanks for the clarification about implicitly closed, but I > > understand this close can't change the zone's write pointer. > > You are right, it does not matter if the zone is implicitly closed, I > was mistaken. But we still have to track the write pointer of every zone > in open or active state, otherwise we cannot know if a write that arrive > to a zone with no outstanding IO is actually at the write pointer, or > whether we need to hold it. > > > > >> > >> > > >> > zone info can be cached in the mapping(hash table)(zone sector is the key, and zone > >> > info is the value), which can be implemented as one LRU style. If any zone > >> > info isn't hit in the mapping table, ioctl(BLKREPORTZONE) can be called for > >> > obtaining the zone info. > >> > > >> >> the same zone, we need to sync across queues. Userspace may have > >> >> synchronization in place to issue writes with multiple threads while > >> >> still hitting the write pointer. > >> > > >> > You can trust mq-dealine, which guaranteed that write IO is sent to ->queue_rq() > >> > in order, no matter MQ or SQ. > >> > > >> > Yes, it could be issue from multiple queues for ublksrv, which doesn't sync > >> > among multiple queues. > >> > > >> > But per-zone re-order still can solve the issue, just need one lock > >> > for each zone to cover the MQ re-order. > >> > >> That lock is already there and using it, mq-deadline will never dispatch > >> more than one write per zone at any time. This is to avoid write > >> reordering. So multi queue or not, for any zone, there is no possibility > >> of having writes reordered. > > > > oops, I miss the single queue depth point per zone, so ublk won't break > > zoned write at all, and I agree order of batch IOs is one problem, but > > not hard to solve. > > The current implementation _does_ break zoned write because it reverses > batched writes. But if it is an easy fix, that is cool :) Please look at Damien's comment: >> That lock is already there and using it, mq-deadline will never dispatch >> more than one write per zone at any time. This is to avoid write >> reordering. So multi queue or not, for any zone, there is no possibility >> of having writes reordered. For zoned write, mq-deadline is used to limit at most one inflight write for each zone. So can you explain a bit how the current implementation breaks zoned write? Thanks, Ming