Re: Abusing XFS_IOC_SWAPEXT until FIEXCHANGE_RANGE is merged

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[cc linux-xfs]

On Wed, Apr 19, 2023 at 08:57:30AM -0400, Paul Khuong wrote:
> Hi Darrick,
> 
> I think FIEXCHANGE_RANGE will be a perfect fit for a lot of my needs.
> In the meantime, would it be (crash-)safe to use XFS_IOC_SWAPEXT as a
> less flexible replacement?

TLDR: SWAPEXT might work for atomic file content swaps, but I haven't
done the work to ensure that it always does, and I don't think anyone
else has.  FIEXCHANGE (without NONATOMIC) will; that's where all my QA
effort has been focused.

SWAPEXT doesn't guarantee atomicity of the extents swapped.  It might
survive an unexpected crash if you have do not have reverse mapping
enabled on the filesystem /and/ mount with -o wsync.

But keep in mind that all the SWAPEXT fstests (until recently) only
checked that fsr works ok, and fsr calls the kernel with two files that
have identical contents.  Also, as you note below, SWAPEXT refuses to
run if the fdtmp has more extents than fdtarget, which makes it doubly
unsuitable for atomic file commits since ... reversing the order of the
fds and retrying is kinda gross.

FIEXCHANGE adds the necessary logging support and removes all those
restrictions so that it /can/ be used for atomic file data commits, in
more or less the sequence you lay out below.  I'll evaluate your steps
as if you were asking about FIEXCHANGE. ;)

> I tested the following sequence in XFS
> (https://gist.github.com/pkhuong/d41f42b1536592cb0dace837c17cb402),
> and, while the happy path seems to work, I'm not fully convinced it
> won't corrupt my data on system crash.
> 
> 0. Assume mutual exclusion is handled somewhere else, so we don't have
> to worry about concurrent writes/swaps

Yes, the kernel locks and flushes both files once you issue the
FIEXCHANGE call.

> 1. open data file
> 2. open O_TMPFILE (* should I instead use a named temporary file?)

Either's fine, FIEXCHANGE operate on file descriptors, not paths.

> 3. FICLONE data file in tmpfile
> 4. overwrite some bytes in tmpfile, without changing its size
> 5. fsync the tmpfile

FIEXCHANGE flushes both files for you after taking the kernel locks.

> 6. SWAPEXT data file "into" tmpfile
> 7. If that failed, try to SWAPEXT the tmpfile into the data file

No need for #7, FIEXCHANGE isn't like FIDEDUPERANGE where there's an
implied direction.

> The last two steps are already strange, because they clearly diverge
> from what xfs_fsr does. AFAICT, there's a kernel-side check that
> "fdtmp" isn't more fragmented than "fdtarget", so I usually have to
> try to set my tmpfile as "fdtarget" and the actual data file as
> "fdtmp." The behaviour when nothing crashes is still correct: a swap
> is commutative.
> 
> I'm worried about seeing a mix of the initial data file and the
> tmpfile's contents after a system crash. In other words, does
> XFS_IOC_SWAPEXT currently implement something like
> FILE_XCHG_RANGE_NONATOMIC (and if so, would XFS be willing to pin down
> and document the order in which extents are swapped? ;), or is the
> swap actually crash-atomic?

It's not guaranteed to be crash atomic at all, and I don't want to
expand the support scope of an ioctl that will soon be part of the
legacy API.

Everyone else please focus on getting FIEXCHANGE reviewed so we can
merge that to upstream.

--D

> TIA,
> 
> Paul Khuong



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux