On Tue, Sep 17, 2024 at 01:54:20PM -0700, Darrick J. Wong wrote: > On Mon, Sep 16, 2024 at 11:24:56AM +0100, John Garry wrote: > > On 16/09/2024 08:03, Dave Chinner wrote: > > > OTOH, we can't do this with atomic writes. Atomic writes require > > > some mkfs help because they require explicit physical alignment of > > > the filesystem to the underlying storage. > > Forcealign requires agsize%extsize==0, No it doesn't. AG size is irrelevant when aligning extents - all that matters is that we can find a free extent that can be trimmed to the alignment defined by extsize. > it's atomicwrites that adds the > requirement that extsize be a power of 2... Only by explicit implementation constraint. Atomic writes do not require power of two extsize - they only require correctly aligned physical extents. e.g an 8kB atomic write is always guaranteed to succeed if we have an extsize of 24kB for laying out the physical extents because a 24kB physical extent is always 8kB aligned and is an exact multiple of 8kB. This meets the requirements for 8kB atomic writes to always succeed, and hence there is no fundamental requirement for extsize to be a power of 2. We have *chosen* to simplify the implementation by only allowing a single aligned atomic write to be issued at a time. This means the alignment and size of atomic writes is always the minimum size the hardware advertises, and that is (at present) always a power of 2. Hence the "extsize needs to be a power of 2" comes from the constraints exposed from the block device configuration (i.e. minimum atomic write unit), not from a filesystem design or implementation constraint. At the filesystem level, we have further simplified things by only allowing extsize = atomic write size. Hence the initial implementation ends up only support power of 2 extsize values. This is not a hard design or implementation constraint, however. ..... hmmmmm. ..... In writing this I've had further thoughts on force-align and the sub-alloc-unit unwritten extent issue we've been discussing here. i.e. I've stopped and considered the existing design constraints given what I wrote above and considered what is needed for supporting large extsize for small atomic writes. I think we need to support large extsize with small atomic write size for two reasons: 1. extsize is still going to be needed for preventing excessive fragmentation with atomic writes. It's the small DIO write workloads that see lots of fragmentation, and applications using atomic writes are essentially being forced down the path of being small DIO write workloads. 2. we can allow force-align w/o atomic writes behaviour to match the existing rtvol sb_rextsize > 1 fsb behaviour without impacting atomic write behaviour. (i.e. less behavioural differences, more common code, better performance, etc). To do this (and I think we do want to do this), then we have to do two things: 1. force-align needs to add a "unwritten align" inode parameter to allow sub-extsize unwritten extent boundaries to exist in the BMBT. (i.e. similar to how rt files w/ sb_rextsize > 1 fsb currently behave.) This is purely an in-memory value - for pure "force-align" we can set it 1 fsb and then the behaviour will match existing RT behaviour. We can abstract this behaviour by replacing the hard coded 1 block alignment for unwritten conversion with an "unwritten align" value which would initially be set to 1. We can also abstract this code away from being "rt specific" and make it entirely dependent on "alloc-unit" configuration. This means rt, force-align and atomic write will all be running the same code, which makes testing a lot easier.. 2. inodes with atomic write enabled must set the unwritten align value to the atomic write size exposed by the hardware, and the extsize must be an exact integer multiple of the unwritten align size. The initial settings of unwritten align == extsize gives the current behaviour of all allocation and extent conversion being aligned to atomic write constraints. The separation of unwritten conversion from the extsize then allows allows the situation I described above with 8kB atomic writes and 24kB extsize. Because unwritten conversion is aligned to atomic wriet boundaries, we can use sub-alloc-unit unwritten extents without violating atomic write boundaries. This would allow us to use extsize for atomic writes in the same manner we use it for now - enable large contiguous allocations to prevent fragmentation when doing lots of concurrent small "immediate write" operations across many files. I think this can all be added on top of the existing patchset - it's not really a fundamental change to any of it. It's a little bit more abstraction and unification, but it enables a lot more flexibility for optimising atomic write functionality in the future. Thoughts? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx