On Wed, Jan 29, 2025 at 08:59:15AM +0000, John Garry wrote: > On 29/01/2025 07:06, Ojaswin Mujoo wrote: > > Hi Ojaswin, > > > > > I would like to submit a proposal to discuss the design of extsize and > > forcealign and various open questions around it. > > > > ** Background ** > > > > Modern NVMe/SCSI disks with atomic write capabilities can allow writes to a > > multi-KB range on disk to go atomically. This feature has a wide variety of use > > cases especially for databases like mysql and postgres that can leverage atomic > > writes to gain significant performance. However, in order to enable atomic > > writes on Linux, the underlying disk may have some size and alignment > > constraints that the upper layers like filesystems should follow. extsize with > > forcealign is one of the ways filesystems can make sure the IO submitted to the > > disk adheres to the atomic writes constraints. > > > > extsize is a hint to the FS to allocate extents at a certian logical alignment > > and size. forcealign builds on this by forcing the allocator to enforce the > > alignment guarantees for physical blocks as well, which is essential for atomic > > writes. > > > > ** Points of discussion ** > > > > Extsize hints feature is already supported by XFS [1] with forcealign still > > under development and discussion [2]. > > From > https://lore.kernel.org/linux-xfs/20241212013433.GC6678@frogsfrogsfrogs/ > thread, the alternate solution to forcealign for XFS is to use a > software-emulated fallback for unaligned atomic writes. I am looking at a > PoC implementation now. Note that this does rely on CoW. > > There has been push back on forcealign for XFS, so we need to prove/disprove > that this software-emulated fallback can work, see > https://lore.kernel.org/linux-xfs/20240924061719.GA11211@xxxxxx/ > Hey John, Thanks for taking a look. I did go through the 2 series sometime back. I agree that there are some open challenges in getting the multi block atomic write interface correct especially for mixed mappings and this is one of the main reasons we want to explore the exchange_range fallback in case blocks are not aligned. That being said, I believe forcealign as a feature still holds a lot of relevance as: 1. Right now, it is the only way to guarantee aligned blocks and hence gurantee that our atomic writes can always benefit from hardware atomic write support. IIUC DBs are not very keen on losing out on performance due to some writes going via the software fallback path. 2. Not all FSes support COW (major example being ext4) and hence it will be very difficult to have a software fallback incase the blocks are not aligned. 3. As pointed out in [1], even with exchange_range there is still value in having forcealign to find the new blocks to be exchanged. I agree that forcealign is not the only way we can have atomic writes work but I do feel there is value in having forcealign for FSes and hence we should have a discussion around it so we can get the interface right. Just to be clear, the intention of this proposal is to mainly discuss forcealign as a feature. I am hoping there would be another different proposal to discuss atomic writes and the plethora of other open challenges there ;) [1] https://lore.kernel.org/linux-xfs/20250117182945.GH1611770@frogsfrogsfrogs/ > > After taking a look at ext4's multi-block > > allocator design, supporting extsize with forcealign can be done in ext4 as > > well. There is a RFC proposed which adds support for extsize hints feature in > > ext4 [3]. However there are some caveats and deviations from XFS design. With > > these in mind, I would like to propose LSFMM topic on: > > > > * exact semantics of extsize w/ forcealign which can bring a consistent > > interface among ext4 and xfs and possibly any other FS that plans to > > implement them in the future. > > > > * Documenting how forcealign with extsize should behave with various FS > > operations like fallocate, truncate, punch hole, insert/collapse range etc > > > > * Implementing extsize with delayed allocation and the challenges there. > > > > * Discussing tooling support of forcealign like how are we planning to maintain > > block alignment gurantees during fsck, resize and other times where we might > > need to move blocks around? > > > > * Documenting any areas where FSes might differ in their implementations of the > > same. Example, ext4 doesn't plan to support non power of 2 extsizes whereas > > XFS has support for that. > > > > Hopefully this discussion will be relevant in defining consistent semantics for > > extsize hints and forcealign which might as well come useful for other FS > > developers too. > > > > Thoughts and suggestions are welcome. > > > > References: > > [1] https://man7.org/linux/man-pages/man2/ioctl_xfs_fsgetxattr.2.html > > [2] https://lore.kernel.org/linux-xfs/20240813163638.3751939-1-john.g.garry@xxxxxxxxxx/ > > [3] https://lore.kernel.org/linux-ext4/cover.1733901374.git.ojaswin@xxxxxxxxxxxxx/ > > > > Regards, > > ojaswin >