Re: [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems for atomic writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29/01/2025 07:06, Ojaswin Mujoo wrote:

Hi Ojaswin,


I would like to submit a proposal to discuss the design of extsize and
forcealign and various open questions around it.

  ** Background **

Modern NVMe/SCSI disks with atomic write capabilities can allow writes to a
multi-KB range on disk to go atomically. This feature has a wide variety of use
cases especially for databases like mysql and postgres that can leverage atomic
writes to gain significant performance. However, in order to enable atomic
writes on Linux, the underlying disk may have some size and alignment
constraints that the upper layers like filesystems should follow. extsize with
forcealign is one of the ways filesystems can make sure the IO submitted to the
disk adheres to the atomic writes constraints.

extsize is a hint to the FS to allocate extents at a certian logical alignment
and size. forcealign builds on this by forcing the allocator to enforce the
alignment guarantees for physical blocks as well, which is essential for atomic
writes.

  ** Points of discussion **

Extsize hints feature is already supported by XFS [1] with forcealign still
under development and discussion [2].

From https://lore.kernel.org/linux-xfs/20241212013433.GC6678@frogsfrogsfrogs/ thread, the alternate solution to forcealign for XFS is to use a software-emulated fallback for unaligned atomic writes. I am looking at a PoC implementation now. Note that this does rely on CoW.

There has been push back on forcealign for XFS, so we need to prove/disprove that this software-emulated fallback can work, see https://lore.kernel.org/linux-xfs/20240924061719.GA11211@xxxxxx/

After taking a look at ext4's multi-block
allocator design, supporting extsize with forcealign can be done in ext4 as
well. There is a RFC proposed which adds support for extsize hints feature in
ext4 [3]. However there are some caveats and deviations from XFS design. With
these in mind, I would like to propose LSFMM topic on:

  * exact semantics of extsize w/ forcealign which can bring a consistent
    interface among ext4 and xfs and possibly any other FS that plans to
    implement them in the future.

  * Documenting how forcealign with extsize should behave with various FS
    operations like fallocate, truncate, punch hole, insert/collapse range etcÂ

  * Implementing extsize with delayed allocation and the challenges there.

  * Discussing tooling support of forcealign like how are we planning to maintain
    block alignment gurantees during fsck, resize and other times where we might
    need to move blocks around?

  * Documenting any areas where FSes might differ in their implementations of the
    same. Example, ext4 doesn't plan to support non power of 2 extsizes whereas
    XFS has support for that.

Hopefully this discussion will be relevant in defining consistent semantics for
extsize hints and forcealign which might as well come useful for other FS
developers too.

Thoughts and suggestions are welcome.

References:
[1] https://urldefense.com/v3/__https://man7.org/linux/man-pages/man2/ioctl_xfs_fsgetxattr.2.html__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVK2oQKuYw$
[2] https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240813163638.3751939-1-john.g.garry@xxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVLgqkSeIg$
[3] https://urldefense.com/v3/__https://lore.kernel.org/linux-ext4/cover.1733901374.git.ojaswin@xxxxxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVJ_GK50Cg$

Regards,
ojaswin





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux