Re: [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems for atomic writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29/01/2025 16:06, Ojaswin Mujoo wrote:
On Wed, Jan 29, 2025 at 08:59:15AM +0000, John Garry wrote:
On 29/01/2025 07:06, Ojaswin Mujoo wrote:

Hi Ojaswin,


I would like to submit a proposal to discuss the design of extsize and
forcealign and various open questions around it.

   ** Background **

Modern NVMe/SCSI disks with atomic write capabilities can allow writes to a
multi-KB range on disk to go atomically. This feature has a wide variety of use
cases especially for databases like mysql and postgres that can leverage atomic
writes to gain significant performance. However, in order to enable atomic
writes on Linux, the underlying disk may have some size and alignment
constraints that the upper layers like filesystems should follow. extsize with
forcealign is one of the ways filesystems can make sure the IO submitted to the
disk adheres to the atomic writes constraints.

extsize is a hint to the FS to allocate extents at a certian logical alignment
and size. forcealign builds on this by forcing the allocator to enforce the
alignment guarantees for physical blocks as well, which is essential for atomic
writes.

   ** Points of discussion **

Extsize hints feature is already supported by XFS [1] with forcealign still
under development and discussion [2].

From
https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20241212013433.GC6678@frogsfrogsfrogs/__;!!ACWV5N9M2RV99hQ!IuMiPMbR5L3B8f31W8tbRlB7d0dMLg2nxW8k7KOGF3t031T99wahnbwnIeDn6N3AdveQJvmbL4V_FBwB0T9U9Q$
thread, the alternate solution to forcealign for XFS is to use a
software-emulated fallback for unaligned atomic writes. I am looking at a
PoC implementation now. Note that this does rely on CoW.

There has been push back on forcealign for XFS, so we need to prove/disprove
that this software-emulated fallback can work, see
https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240924061719.GA11211@xxxxxx/__;!!ACWV5N9M2RV99hQ!IuMiPMbR5L3B8f31W8tbRlB7d0dMLg2nxW8k7KOGF3t031T99wahnbwnIeDn6N3AdveQJvmbL4V_FBwv-uf6Ig$


Hey John,

Thanks for taking a look. I did go through the 2 series sometime back.
I agree that there are some open challenges in getting the multi block
atomic write interface correct especially for mixed mappings and this is
one of the main reasons we want to explore the exchange_range fallback
in case blocks are not aligned.

Right, so for XFS I am looking at a CoW-based fallback for unaligned/mixed mapping atomic writes. I have no idea on how this could work for ext4.


That being said, I believe forcealign as a feature still holds a lot
of relevance as:

1. Right now, it is the only way to guarantee aligned blocks and hence
    gurantee that our atomic writes can always benefit from hardware atomic
    write support. IIUC DBs are not very keen on losing out on performance
    due to some writes going via the software fallback path.

Sure, we need performance figures for this first.


2. Not all FSes support COW (major example being ext4) and hence it will
    be very difficult to have a software fallback incase the blocks are
	 not aligned.

Understood


3. As pointed out in [1], even with exchange_range there is still value
    in having forcealign to find the new blocks to be exchanged.

Yeah, again, we need performance figures.

For my test case, I am trying 16K atomic writes with 4K FS block size, so I expect the software fallback to not kick in often after running the system for a while (as eventually we will get an aligned allocations). I am concerned of prospect of heavily fragmented files, though.


I agree that forcealign is not the only way we can have atomic writes
work but I do feel there is value in having forcealign for FSes and
hence we should have a discussion around it so we can get the interface
right.


I thought that the interface for forcealign according to the candidate xfs implementation was quite straightforward. no?

What was not clear was the age-old issue of how to issue an atomic write of mixed extents, which is really an atomic write issue.

Just to be clear, the intention of this proposal is to mainly discuss
forcealign as a feature. I am hoping there would be another different
proposal to discuss atomic writes and the plethora of other open
challenges there ;)

Thanks,
John




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux