On Mon, Feb 6, 2023 at 7:58 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > On 2/6/23 02:00, Hans Holmberg wrote: > > I think we're missing a flexible way of routing random-ish > > write workloads on to zoned storage devices. Implementing a UBLK > > target for this would be a great way to provide zoned storage > > benefits to a range of use cases. Creating UBLK target would > > enable us experiment and move fast, and when we arrive > > at a common, reasonably stable, solution we could move this into > > the kernel. > > > > We do have dm-zoned [3]in the kernel, but it requires a bounce > > on conventional zones for non-sequential writes, resulting in a write > > amplification of 2x (which is not optimal for flash). > > > > Fully random workloads make little sense to store on ZBDs as a > > host FTL could not be expected to do better than what conventional block > > devices do today. Fully sequential writes are also well taken care of > > by conventional block devices. > > > > The interesting stuff is what lies in between those extremes. > > > > I would like to discuss how we could use UBLK to implement a > > common FTL with the right knobs to cater for a wide range of workloads > > that utilize raw block devices. We had some knobs in the now-dead pblk, > > a FTL for open channel devices, but I think we could do way better than that. > > > > Pblk did not require bouncing writes and had knobs for over-provisioning and > > workload isolation which could be implemented. We could also add options > > for different garbage collection policies. In userspace it would also > > be easy to support default block indirection sizes, reducing logical-physical > > translation table memory overhead. > > > > Use cases for such an FTL includes SSD caching stores such as Apache > > traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache > > traffic server storage workloads are *almost* zone block device compatible > > and would need little translation overhead to perform very well on e.g. > > ZNS SSDs. > > > > There are probably more use cases that would benefit. > > > > It would also be a great research vehicle for academia. We've used dm-zap > > for this [4] purpose the last couple of years, but that is not production-ready > > and cumbersome to improve and maintain as it is implemented as a out-of-tree > > device mapper. > > > > ublk adds a bit of latency overhead, but I think this is acceptable at least > > until we have a great, proven solution, which could be turned into > > an in-kernel FTL. > > > > If there is interest in the community for a project like this, let's talk! > > > > cc:ing the folks who participated in the discussions at ALPSS 2021 and last > > years' plumbers on this subject. > > > > Thanks, > > Hans > > > > [1] https://trafficserver.apache.org/ > > [2] https://cachelib.org/ > > [3] https://docs.kernel.org/admin-guide/device-mapper/dm-zoned.html > > [4] https://github.com/westerndigitalcorporation/dm-zap > > Hi Hans, > > Which functionality would such a user space target provide that is not > yet provided by BTRFS, F2FS or any other log-structured filesystem that > supports zoned block devices? > Hi Bart, The use cases I'm primarily thinking of are applications and services that work on top of raw block interfaces, like Apache Traffic server and Cachelib mentioned in my proposal. These workloads benefit from not using a file system. The file system overhead is just too big for storing millions of (> 2kiB) sized objects and billions of < 2kiB tiny objects. For the larger objects, the write pattern is log structured and almost fully sequential. Zoned storage would provide a benefit if multiple instances of these caches would be co-located on the same media, resulting in mixing of these streams, or if a large object cache would be mixed with other, random workloads, like the cache lib store for small objects. Cache workloads have relaxed persistence requirements. It's not the end of the world if an object disappears. I can recommend [1] and [2] as an introduction to these workloads. In my plumbers talk [3] from last year I sketched out how zoned storage could benefit object caching on flash. [1] https://www.usenix.org/conference/osdi20/presentation/berg [2] https://engineering.fb.com/2021/10/26/core-data/kangaroo/ [3] https://lpc.events/event/16/contributions/1232/attachments/1066/2095/LPC%202022%20Zoned%20MC%20Improving%20object%20caches%20using%20ZNS%20V2.pdf Cheers, Hans