> On Mar 2, 2022, at 10:29 PM, Javier González <javier@xxxxxxxxxxx> wrote: > > On 03.03.2022 06:32, Javier González wrote: >> >>> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: >>> >>> Thinking proactively about LSFMM, regarding just Zone storage.. >>> >>> I'd like to propose a BoF for Zoned Storage. The point of it is >>> to address the existing point points we have and take advantage of >>> having folks in the room we can likely settle on things faster which >>> otherwise would take years. >>> >>> I'll throw at least one topic out: >>> >>> * Raw access for zone append for microbenchmarks: >>> - are we really happy with the status quo? >>> - if not what outlets do we have? >>> >>> I think the nvme passthrogh stuff deserves it's own shared >>> discussion though and should not make it part of the BoF. >>> >>> Luis >> >> Thanks for proposing this, Luis. >> >> I’d like to join this discussion too. >> >> Thanks, >> Javier > > Let me expand a bit on this. There is one topic that I would like to > cover in this session: > > - PO2 zone sizes > In the past weeks we have been talking to Damien and Matias around > the constraint that we currently have for PO2 zone sizes. While > this has not been an issue for SMR HDDs, the gap that ZNS > introduces between zone capacity and zone size causes holes in the > address space. This unmapped LBA space has been the topic of > discussion with several ZNS adopters. > > One of the things to note here is that even if the zone size is a > PO2, the zone capacity is typically not. This means that even when > we can use shifts to move around zones, the actual data placement > algorithms need to deal with arbitrary sizes. So at the end of the > day applications that use a contiguous address space - like in a > conventional block device -, will have to deal with this. > > Since chunk_sectors is no longer required to be a PO2, we have > started the work in removing this constraint. We are working in 2 > phases: > > 1. Add an emulation layer in NVMe driver to simulate PO2 devices > when the HW presents a zone_capacity = zone_size. This is a > product of one of Damien's early concerns about supporting > existing applications and FSs that work under the PO2 > assumption. We will post these patches in the next few days. > > 2. Remove the PO2 constraint from the block layer and add > support for arbitrary zone support in btrfs. This will allow the > raw block device to be present for arbitrary zone sizes (and > capacities) and btrfs will be able to use it natively. > > For completeness, F2FS works natively in PO2 zone sizes, so we > will not do work here for now, as the changes will not bring any > benefit. For F2FS, the emulation layer will help use devices > that do not have PO2 zone sizes. > > We are working towards having at least a RFC of (2) before LSF/MM. > Since this is a topic that involves several parties across the > stack, I believe that a F2F conversation will help laying the path > forward. > > Thanks, > Javier > I am working on Zoned storage for some time as well. I would like to be part of this discussion as well. Thanks! -- Himanshu Madhani Oracle Linux Engineering