[cc linux-xfs] On Wed, Apr 19, 2023 at 08:57:30AM -0400, Paul Khuong wrote: > Hi Darrick, > > I think FIEXCHANGE_RANGE will be a perfect fit for a lot of my needs. > In the meantime, would it be (crash-)safe to use XFS_IOC_SWAPEXT as a > less flexible replacement? TLDR: SWAPEXT might work for atomic file content swaps, but I haven't done the work to ensure that it always does, and I don't think anyone else has. FIEXCHANGE (without NONATOMIC) will; that's where all my QA effort has been focused. SWAPEXT doesn't guarantee atomicity of the extents swapped. It might survive an unexpected crash if you have do not have reverse mapping enabled on the filesystem /and/ mount with -o wsync. But keep in mind that all the SWAPEXT fstests (until recently) only checked that fsr works ok, and fsr calls the kernel with two files that have identical contents. Also, as you note below, SWAPEXT refuses to run if the fdtmp has more extents than fdtarget, which makes it doubly unsuitable for atomic file commits since ... reversing the order of the fds and retrying is kinda gross. FIEXCHANGE adds the necessary logging support and removes all those restrictions so that it /can/ be used for atomic file data commits, in more or less the sequence you lay out below. I'll evaluate your steps as if you were asking about FIEXCHANGE. ;) > I tested the following sequence in XFS > (https://gist.github.com/pkhuong/d41f42b1536592cb0dace837c17cb402), > and, while the happy path seems to work, I'm not fully convinced it > won't corrupt my data on system crash. > > 0. Assume mutual exclusion is handled somewhere else, so we don't have > to worry about concurrent writes/swaps Yes, the kernel locks and flushes both files once you issue the FIEXCHANGE call. > 1. open data file > 2. open O_TMPFILE (* should I instead use a named temporary file?) Either's fine, FIEXCHANGE operate on file descriptors, not paths. > 3. FICLONE data file in tmpfile > 4. overwrite some bytes in tmpfile, without changing its size > 5. fsync the tmpfile FIEXCHANGE flushes both files for you after taking the kernel locks. > 6. SWAPEXT data file "into" tmpfile > 7. If that failed, try to SWAPEXT the tmpfile into the data file No need for #7, FIEXCHANGE isn't like FIDEDUPERANGE where there's an implied direction. > The last two steps are already strange, because they clearly diverge > from what xfs_fsr does. AFAICT, there's a kernel-side check that > "fdtmp" isn't more fragmented than "fdtarget", so I usually have to > try to set my tmpfile as "fdtarget" and the actual data file as > "fdtmp." The behaviour when nothing crashes is still correct: a swap > is commutative. > > I'm worried about seeing a mix of the initial data file and the > tmpfile's contents after a system crash. In other words, does > XFS_IOC_SWAPEXT currently implement something like > FILE_XCHG_RANGE_NONATOMIC (and if so, would XFS be willing to pin down > and document the order in which extents are swapped? ;), or is the > swap actually crash-atomic? It's not guaranteed to be crash atomic at all, and I don't want to expand the support scope of an ioctl that will soon be part of the legacy API. Everyone else please focus on getting FIEXCHANGE reviewed so we can merge that to upstream. --D > TIA, > > Paul Khuong