Re: Reordering of ublk IO requests

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Fri, 18 Nov 2022 13:35:29 +0900

On 11/18/22 13:12, Ming Lei wrote:
[...]
>>> You can only assign it to zoned write request, but you still have to check
>>> the sequence inside each zone, right? Then why not just check LBAs in
>>> each zone simply?
>>
>> We would need to know the zone map, which is not otherwise required.
>> Then we would need to track the write pointer for each open zone for
>> each queue, so that we can stall writes that are not issued at the write
>> pointer. This is in effect all zones, because we cannot track when zones
>> are implicitly closed. Then, if different queues are issuing writes to
> 
> Can you explain "implicitly closed" state a bit?
> 
> From https://zonedstorage.io/docs/introduction/zoned-storage, only the
> following words are mentioned about closed state:
> 
> 	```Conversely, implicitly or explicitly opened zoned can be transitioned to the
> 	closed state using the CLOSE ZONE command.```

When a write is issued to an empty or closed zone, the drive will
automatically transition the zone into the implicit open state. This is
called implicit open because the host did not (explicitly) issue an open
zone command.

When there are too many implicitly open zones, the drive may choose to
close one of the implicitly opened zone to implicitly open the zone that
is a target for a write command.

Simple in a nutshell. This is done so that the drive can work with a
limited set of resources needed to handle open zones, that is, zones that
are being written. There are some more nasty details to all this with
limits on the number of open zones and active zones that a zoned drive may
have.

> 
> zone info can be cached in the mapping(hash table)(zone sector is the key, and zone
> info is the value), which can be implemented as one LRU style. If any zone
> info isn't hit in the mapping table, ioctl(BLKREPORTZONE) can be called for
> obtaining the zone info.
> 
>> the same zone, we need to sync across queues. Userspace may have
>> synchronization in place to issue writes with multiple threads while
>> still hitting the write pointer.
> 
> You can trust mq-dealine, which guaranteed that write IO is sent to ->queue_rq()
> in order, no matter MQ or SQ.
> 
> Yes, it could be issue from multiple queues for ublksrv, which doesn't sync
> among multiple queues.
> 
> But per-zone re-order still can solve the issue, just need one lock
> for each zone to cover the MQ re-order.

That lock is already there and using it, mq-deadline will never dispatch
more than one write per zone at any time. This is to avoid write
reordering. So multi queue or not, for any zone, there is no possibility
of having writes reordered.

-- 
Damien Le Moal
Western Digital Research