Re: [LSF/MM/BPF TOPIC] Improving Zoned Storage Support

Bart Van Assche <bvanassche@xxxxxxx> · Tue, 16 Jan 2024 17:21:49 -0800

On 1/16/24 15:34, Damien Le Moal wrote:
On 1/17/24 03:20, Bart Van Assche wrote:
File system implementers have to decide whether to use Write or Zone
Append. While the Zone Append command tolerates reordering, with this
command the filesystem cannot control the order in which the data is
written on the medium without restricting the queue depth to one.
Additionally, the latency of write operations is lower compared to zone
append operations. From [2], a paper with performance results for one
ZNS SSD model: "we observe that the latency of write operations is lower
than that of append operations, even if the request size is the same".

What is the queue depth for this claim ?

Hmm ... I haven't found this in the paper. Maybe I overlooked something.

The mq-deadline I/O scheduler serializes zoned writes even if these got
reordered by the block layer. However, the mq-deadline I/O scheduler,
just like any other single-queue I/O scheduler, is a performance
bottleneck for SSDs that support more than 200 K IOPS. Current NVMe and
UFS 4.0 block devices support more than 200 K IOPS.

FYI, I am about to post 20-something patches that completely remove zone write
locking and replace it with "zone write plugging". That is done above the IO
scheduler and also provides zone append emulation for drives that ask for it.

With this change:
  - Zone append emulation is moved to the block layer, as a generic
implementation. sd and dm zone append emulation code is removed.
  - Any scheduler can be used, including "none". mq-deadline zone block device
special support is removed.
  - Overall, a lot less code (the series removes more code than it adds).
  - Reordering problems such as due to IO priority is resolved as well.

This will need a lot of testing, which we are working on. But your help with
testing on UFS devices will be appreciated as well.

That sounds very interesting. I can help with reviewing the kernel
patches and also with testing these.

Thanks,

Bart.