Re: [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems for atomic writes

Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> · Fri, 7 Feb 2025 11:38:08 +0530

On Tue, Feb 04, 2025 at 12:20:25PM +0000, John Garry wrote:
> On 01/02/2025 07:12, Ojaswin Mujoo wrote:
> 
> Hi Ojaswin,
> 
> > > For my test case, I am trying 16K atomic writes with 4K FS block size, so I
> > > expect the software fallback to not kick in often after running the system
> > > for a while (as eventually we will get an aligned allocations). I am
> > > concerned of prospect of heavily fragmented files, though.
> > Yes that's true, if the FS is up long enough there is bound to be
> > fragmentation eventually which might make it harder for extsize to
> > get the blocks.
> > 
> > With software fallback, there's again the point that many FSes will need
> > some sort of COW/exchange_range support before they can support anything
> > like that.
> > 
> > Although I;ve not looked at what it will take to add that to
> > ext4 but I'm assuming it will not be trivial at all.
> 
> Sure, but then again you may not have issues with getting forcealign support
> accepted for ext4. However, I would have thought that bigalloc was good
> enough to use initially.

Yes, bigalloc is indeed good enough as a start but yes eventually
something like forcealign will be beneficial as not everyone prefers an
FS-wide cluster-size allocation granularity.

We do have a patch for atomic writes with bigalloc that was sent way
back in mid 2024 but then we went into the same discussion of mixed
mapping[1].

Hmm I think it might be time to revisit that and see if we can do
something better there.

[1] https://lore.kernel.org/linux-ext4/37baa9f4c6c2994df7383d8b719078a527e521b9.1729825985.git.ritesh.list@xxxxxxxxx/
> 
> > 
> > > > I agree that forcealign is not the only way we can have atomic writes
> > > > work but I do feel there is value in having forcealign for FSes and
> > > > hence we should have a discussion around it so we can get the interface
> > > > right.
> > > > 
> > > I thought that the interface for forcealign according to the candidate xfs
> > > implementation was quite straightforward. no?
> > As mentioned in the original proposal, there are still a open problems
> > around extsize and forcealign.
> > 
> > - The allocation and deallocation semantics are not completely clear to
> > 	me for example we allow operations like unaligned punch_hole but not
> > 	unaligned insert and collapse range, and I couldn't see that
> > 	documented anywhere.
> 
> For xfs, we were imposing the same restrictions as which we have for
> rtextsize > 1.
> 
> If you check the following:
> https://lore.kernel.org/linux-xfs/20240813163638.3751939-9-john.g.garry@xxxxxxxxxx/
> 
> You can see how the large allocunit value is affected by forcealign, and
> then check callers of xfs_is_falloc_aligned() -> xfs_inode_alloc_unitsize()
> to see how this affects some fallocate modes.

True, but it's something that just implicitly happens when we use
forcealign. I eventually found out while testing forcealign with
different operations but such things can come as a surprise to users
especially when we support some operations to be unaligned and then
reject some other similar ones.

punch_hole/collapse_range is just an example and yes it might not be
very important to support unaligned collapse range but in the long run
it would be good to have these things documented/discussed.
> 
> > 
> > - There are challenges in extsize with delayed allocation as well as how
> > 	the tooling should handle forcealigned inodes.
> 
> Yeah, maybe. I was only testing my xfs forcealign solution for dio (and no
> delayed alloc).
> 
> > 
> > - How are FSes supposed to behave when forcealign/extsize is used with
> > 	other FS features that change the allocation granularity like bigalloc
> > 	or rtvol.
> 
> As you would expect, they need to be aligned with one another.
> 
> For example, in the case of xfs rtvol, rextsize needs to be a multiple of
> extsize when forcealign is enabled. Or the other way around, I forget now..
> 
> > 
> > I agree that XFS's implementation is a good reference but I'm
> > sure as I continue working on the same from ext4 perspective we will have
> > more points of discussion. So I definitely feel that its worth
> > discussing this at LSFMM.
> 
> Understood, but I wait to see what happens to my CoW-based method for XFS to
> see where that goes before commenting on what needs to be discussed for xfs

Got it.
> 
> > 
> > > What was not clear was the age-old issue of how to issue an atomic write of
> > > mixed extents, which is really an atomic write issue.
> > Right, btw are you planning any talk for atomic writes at LSFMM?
> 
> I hadn't planned on it, but I guess that Martin will add something to the
> agenda.
> 
> Thanks,
> John
>