Hi all, This series creates a new log incompat feature and log intent items to track high level progress of swapping ranges of two files and finish interrupted work if the system goes down. It then adds a new FISWAPRANGE ioctl so that userspace can access the atomic extent swapping feature. With this feature, user programs will be able to update files atomically by opening an O_TMPFILE, reflinking the source file to it, making whatever updates they want to make, and then atomically swap the changed bits back to the source file. It even has an optional ability to detect a changed source file and reject the update. The intent behind this new userspace functionality is to enable atomic rewrites of arbitrary parts of individual files. For years, application programmers wanting to ensure the atomicity of a file update had to write the changes to a new file in the same directory, fsync the new file, rename the new file on top of the old filename, and then fsync the directory. People get it wrong all the time, and $fs hacks abound. With atomic file updates, this is no longer necessary. Programmers create an O_TMPFILE, optionally FICLONE the file contents into the temporary file, make whatever changes they want to the tempfile, and FISWAPRANGE the contents from the tempfile into the regular file. The interface can optionally check the original file's [cm]time to reject the swap operation if the file has been modified by. There are no fsyncs to take care of; no directory operations at all; and the fs will take care of finishing the swap operation if the system goes down in the middle of the swap. Sample code can be found in the corresponding changes to xfs_io to exercise the use case mentioned above. Note that this function is /not/ the O_DIRECT atomic file writes concept that has been floating around for years. This is constructed entirely in software, which means that there are no limitations other than the regular filesystem limits. As a side note, there's an extra motivation behind the kernel functionality: online repair of file-based metadata. The atomic file swap is implemented as an atomic inode fork swap, which means that we can implement online reconstruction of extended attributes and directories by building a new one in another inode and atomically swap the contents. Next, we adapt the online filesystem repair code to use atomic extent swapping. This enables repair functions to construct a clean copy of a directory, xattr information, realtime bitmaps, and realtime summary information in a temporary inode. If this completes successfully, the new contents can be swapped atomically into the inode being repaired. This is essential to avoid making corruption problems worse if the system goes down in the middle of running repair. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=atomic-file-updates fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=atomic-file-updates xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=atomic-file-updates --- Documentation/filesystems/vfs.rst | 16 + fs/ioctl.c | 42 ++ fs/remap_range.c | 283 +++++++++++++ fs/xfs/Makefile | 3 fs/xfs/libxfs/xfs_defer.c | 49 ++ fs/xfs/libxfs/xfs_defer.h | 11 - fs/xfs/libxfs/xfs_errortag.h | 4 fs/xfs/libxfs/xfs_format.h | 38 ++ fs/xfs/libxfs/xfs_fs.h | 2 fs/xfs/libxfs/xfs_log_format.h | 63 +++ fs/xfs/libxfs/xfs_log_recover.h | 4 fs/xfs/libxfs/xfs_sb.c | 2 fs/xfs/libxfs/xfs_swapext.c | 793 +++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_swapext.h | 89 ++++ fs/xfs/xfs_bmap_item.c | 13 - fs/xfs/xfs_bmap_util.c | 595 ---------------------------- fs/xfs/xfs_bmap_util.h | 3 fs/xfs/xfs_error.c | 3 fs/xfs/xfs_extfree_item.c | 2 fs/xfs/xfs_file.c | 49 ++ fs/xfs/xfs_inode.c | 13 + fs/xfs/xfs_inode.h | 1 fs/xfs/xfs_ioctl.c | 102 +---- fs/xfs/xfs_ioctl.h | 4 fs/xfs/xfs_ioctl32.c | 8 fs/xfs/xfs_log.c | 10 fs/xfs/xfs_log_recover.c | 41 ++ fs/xfs/xfs_mount.c | 119 ++++++ fs/xfs/xfs_mount.h | 2 fs/xfs/xfs_refcount_item.c | 2 fs/xfs/xfs_rmap_item.c | 2 fs/xfs/xfs_super.c | 26 + fs/xfs/xfs_swapext_item.c | 649 ++++++++++++++++++++++++++++++ fs/xfs/xfs_swapext_item.h | 61 +++ fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 116 +++++ fs/xfs/xfs_xchgrange.c | 721 ++++++++++++++++++++++++++++++++++ fs/xfs/xfs_xchgrange.h | 30 + include/linux/fs.h | 14 + include/uapi/linux/fiexchange.h | 101 +++++ 40 files changed, 3363 insertions(+), 724 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_swapext.c create mode 100644 fs/xfs/libxfs/xfs_swapext.h create mode 100644 fs/xfs/xfs_swapext_item.c create mode 100644 fs/xfs/xfs_swapext_item.h create mode 100644 fs/xfs/xfs_xchgrange.c create mode 100644 fs/xfs/xfs_xchgrange.h create mode 100644 include/uapi/linux/fiexchange.h