On Tue, Mar 15, 2022 at 02:14:23PM +0000, Johannes Thumshirn wrote: > On 15/03/2022 14:52, Javier González wrote: > > On 15.03.2022 14:30, Christoph Hellwig wrote: > >> On Tue, Mar 15, 2022 at 02:26:11PM +0100, Javier González wrote: > >>> but we do not see a usage for ZNS in F2FS, as it is a mobile > >>> file-system. As other interfaces arrive, this work will become natural. > >>> > >>> ZoneFS and butrfs are good targets for ZNS and these we can do. I would > >>> still do the work in phases to make sure we have enough early feedback > >>> from the community. > >>> > >>> Since this thread has been very active, I will wait some time for > >>> Christoph and others to catch up before we start sending code. > >> > >> Can someone summarize where we stand? Between the lack of quoting > >> from hell and overly long lines from corporate mail clients I've > >> mostly stopped reading this thread because it takes too much effort > >> actually extract the information. > > > > Let me give it a try: > > > > - PO2 emulation in NVMe is a no-go. Drop this. > > > > - The arguments against supporting PO2 are: > > - It makes ZNS depart from a SMR assumption of PO2 zone sizes. This > > can create confusion for users of both SMR and ZNS > > > > - Existing applications assume PO2 zone sizes, and probably do > > optimizations for these. These applications, if wanting to use > > ZNS will have to change the calculations > > > > - There is a fear for performance regressions. > > > > - It adds more work to you and other maintainers > > > > - The arguments in favour of PO2 are: > > - Unmapped LBAs create holes that applications need to deal with. > > This affects mapping and performance due to splits. Bo explained > > this in a thread from Bytedance's perspective. I explained in an > > answer to Matias how we are not letting zones transition to > > offline in order to simplify the host stack. Not sure if this is > > something we want to bring to NVMe. > > > > - As ZNS adds more features and other protocols add support for > > zoned devices we will have more use-cases for the zoned block > > device. We will have to deal with these fragmentation at some > > point. > > > > - This is used in production workloads in Linux hosts. I would > > advocate for this not being off-tree as it will be a headache for > > all in the future. > > > > - If you agree that removing PO2 is an option, we can do the following: > > - Remove the constraint in the block layer and add ZoneFS support > > in a first patch. > > > > - Add btrfs support in a later patch > > (+ linux-btrfs ) > > Please also make sure to support btrfs and not only throw some patches > over the fence. Zoned device support in btrfs is complex enough and has > quite some special casing vs regular btrfs, which we're working on getting > rid of. So having non-power-of-2 zone size, would also mean having NPO2 > block-groups (and thus block-groups not aligned to the stripe size). > > Just thinking of this and knowing I need to support it gives me a > headache. PO2 is really easy to work with and I guess allocation on the physical device could also benefit from that, I'm still puzzled why the NPO2 is even proposed. We can possibly hide the calculations behind some API so I hope in the end it should be bearable. The size of block groups is flexible we only want some reasonable alignment. > Also please consult the rest of the btrfs developers for thoughts on this. > After all btrfs has full zoned support (including ZNS, not saying it's > perfect) and is also the default FS for at least two Linux distributions. I haven't read the whole thread yet, my impression is that some hardware is deliberately breaking existing assumptions about zoned devices and in turn breaking btrfs support. I hope I'm wrong on that or at least that it's possible to work around it.