Re: [LSF/MM/BPF TOPIC] durability vs performance for flash devices (especially embedded!)

Ric Wheeler <ricwheeler@xxxxxxxxx> · Wed, 9 Jun 2021 21:11:53 -0400

On 6/9/21 8:16 PM, Damien Le Moal wrote:
On 2021/06/10 3:47, Bart Van Assche wrote:
On 6/9/21 11:30 AM, Matthew Wilcox wrote:
maybe you should read the paper.

" Thiscomparison demonstrates that using F2FS, a flash-friendly file
sys-tem, does not mitigate the wear-out problem, except inasmuch asit
inadvertently rate limitsallI/O to the device"
It seems like my email was not clear enough? What I tried to make clear
is that I think that there is no way to solve the flash wear issue with
the traditional block interface. I think that F2FS in combination with
the zone interface is an effective solution.

What is also relevant in this context is that the "Flash drive lifespan
is a problem" paper was published in 2017. I think that the first
commercial SSDs with a zone interface became available at a later time
(summer of 2020?).
Yes, zone support in the block layer and f2fs was added with kernel 4.10
released in Feb 2017. So the authors likely did not consider that as a solution,
especially considering that at the time, it was all about SMR HDDs only. Now, we
do have ZNS and things like SD-Express coming which may allow NVMe/ZNS on even
the cheapest of consumer devices.

That said, I do not think that f2fs is not yet an ideal solution as is since all
its metadata need update in-place, so are subject to the drive implementation of
FTL/weir leveling. And the quality of this varies between devices and vendors...

btrfs zone support improves that as even the super blocks are not updated in
place on zoned devices. Everything is copy-on-write, sequential write into
zones. While the current block allocator is rather simple for now, it could be
tweaked to add some weir leveling awareness, eventually (per zone weir leveling
is something much easier to do inside the drive though, so the host should not
care).

In the context of zoned storage, the discussion could be around how to best
support file systems. Do we keep modifying one file system after another to
support zones, or implement weir leveling ? That is *very* hard to do and
sometimes not reasonably feasible depending on the FS design.

I do remember Dave Chinner talk back in 2018 LSF/MM (was it ?) where he
discussed the idea of having block allocation moved out of FSes and turned into
a kind of library common to many file systems. In the context of consumer flash
weir leveling, and eventually zones (likely with some remapping needed), this
may be something interesting to discuss again.

Some of the other bits that make this hard in the embedded space include 
layering on top of device mapper - using dm verity for example - and our usual 
problem of having apps that drive too many small IO's down to service sqlite 
transactions.

Looking to get some measurements done to show the write amplification - measure 
the amount of writes done in total by applications - and what that translates 
into for device requests. Anything done for metadata, logging, etc all counts as 
"write amplification" when viewed this way.

Useful to try and figure out what the best case durability of parts would be for 
specific workloads.

Measuring write amplification inside of a device is often possible as well so we 
could end up getting a pretty clear picture.

Regards,

Ric