Re: Reordering of ublk IO requests

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 18 Nov 2022 14:07:08 +0800

On Fri, Nov 18, 2022 at 01:35:29PM +0900, Damien Le Moal wrote:
> On 11/18/22 13:12, Ming Lei wrote:
> [...]
> >>> You can only assign it to zoned write request, but you still have to check
> >>> the sequence inside each zone, right? Then why not just check LBAs in
> >>> each zone simply?
> >>
> >> We would need to know the zone map, which is not otherwise required.
> >> Then we would need to track the write pointer for each open zone for
> >> each queue, so that we can stall writes that are not issued at the write
> >> pointer. This is in effect all zones, because we cannot track when zones
> >> are implicitly closed. Then, if different queues are issuing writes to
> > 
> > Can you explain "implicitly closed" state a bit?
> > 
> > From https://zonedstorage.io/docs/introduction/zoned-storage, only the
> > following words are mentioned about closed state:
> > 
> > 	```Conversely, implicitly or explicitly opened zoned can be transitioned to the
> > 	closed state using the CLOSE ZONE command.```
> 
> When a write is issued to an empty or closed zone, the drive will
> automatically transition the zone into the implicit open state. This is
> called implicit open because the host did not (explicitly) issue an open
> zone command.
> 
> When there are too many implicitly open zones, the drive may choose to
> close one of the implicitly opened zone to implicitly open the zone that
> is a target for a write command.
> 
> Simple in a nutshell. This is done so that the drive can work with a
> limited set of resources needed to handle open zones, that is, zones that
> are being written. There are some more nasty details to all this with
> limits on the number of open zones and active zones that a zoned drive may
> have.

OK, thanks for the clarification about implicitly closed, but I
understand this close can't change the zone's write pointer.

> 
> > 
> > zone info can be cached in the mapping(hash table)(zone sector is the key, and zone
> > info is the value), which can be implemented as one LRU style. If any zone
> > info isn't hit in the mapping table, ioctl(BLKREPORTZONE) can be called for
> > obtaining the zone info.
> > 
> >> the same zone, we need to sync across queues. Userspace may have
> >> synchronization in place to issue writes with multiple threads while
> >> still hitting the write pointer.
> > 
> > You can trust mq-dealine, which guaranteed that write IO is sent to ->queue_rq()
> > in order, no matter MQ or SQ.
> > 
> > Yes, it could be issue from multiple queues for ublksrv, which doesn't sync
> > among multiple queues.
> > 
> > But per-zone re-order still can solve the issue, just need one lock
> > for each zone to cover the MQ re-order.
> 
> That lock is already there and using it, mq-deadline will never dispatch
> more than one write per zone at any time. This is to avoid write
> reordering. So multi queue or not, for any zone, there is no possibility
> of having writes reordered.

oops, I miss the single queue depth point per zone, so ublk won't break
zoned write at all, and I agree order of batch IOs is one problem, but
not hard to solve.

Thanks,
Ming