This series introduces a proof-of-concept for buffered block atomic writes. There is a requirement for userspace to be able to issue a write which will not be torn due to HW or some other failure. A solution is presented in [0] and [1]. Those series mentioned only support atomic writes for direct IO. The primary target of atomic (or untorn) writes is DBs like InnoDB/MySQL, which require direct IO support. However, as mentioned in [2], there is a want to support atomic writes for DBs which use buffered writes, like Postgres. The issue raised in [2] was that the API proposed is not suitable for buffered atomic writes. Specifically, since the API permits a range of sizes of atomic writes, it is too difficult to track in the pagecache the geometry of atomic writes which overlap with other atomic writes of differing sizes and alignment. In addition, tracking and handling overlapping atomic and non-atomic writes is difficult also. In this series, buffered atomic writes are supported based upon the following principles: - A buffered atomic write requires RWF_ATOMIC flag be set, same as direct IO. The same other atomic writes rules apply, like power-of-2 size and naturally aligned. - For an inode, only a single size of buffered write is allowed. So for statx, atomic_write_unit_min = atomic_write_unit_max always for buffered atomic writes. - A single folio maps to an atomic write in the pagecache. Folios match atomic writes well, as an atomic write must be a power-of-2 in size and naturally aligned. - A folio is tagged as "atomic" when atomically written. If any part of an "atomic" folio is fully or partially overwritten with a non-atomic write, the folio loses it atomicity. Indeed, issuing a non-atomic write over an atomic write would typically be seen as a userspace bug. - If userspace wants to guarantee a buffered atomic write is written to media atomically after the write syscall returns, it must use RWF_SYNC or similar (along with RWF_ATOMIC). This series just supports buffered atomic writes for XFS. I do have some patches for bdev file operations buffered atomic writes. I did not include them, as: a. I don't know of any requirement for this support b. atomic_write_unit_min and atomic_write_unit_max would be fixed at PAGE_SIZE there. This is very limiting. However an API like BLKBSZSET could be added to allow userspace to program the values for atomic_write_unit_{min, max}. c. We may want to support atomic_write_unit_{min, max} < PAGE_SIZE, and this becomes more complicated to support. d. I would like to see what happens with bs > ps work there. This series is just an early proof-of-concept, to prove that the API proposed for block atomic writes can work for buffered IO. I would like to unblock that direct IO series and have it merged. Patches are based on [0], [1], and [3] (the bs > ps series). For the bs > ps series, I had to borrow an earlier filemap change which allows the folio min and max order be selected. All patches can be found at: https://github.com/johnpgarry/linux/tree/atomic-writes-v6.9-v6-fs-v2-buffered [0] https://lore.kernel.org/linux-block/20240326133813.3224593-1-john.g.garry@xxxxxxxxxx/ [1] https://lore.kernel.org/linux-block/20240304130428.13026-1-john.g.garry@xxxxxxxxxx/ [2] https://lore.kernel.org/linux-fsdevel/20240228061257.GA106651@xxxxxxx/ [3] https://lore.kernel.org/linux-xfs/20240313170253.2324812-1-kernel@xxxxxxxxxxxxxxxx/ John Garry (7): fs: Rename STATX{_ATTR}_WRITE_ATOMIC -> STATX{_ATTR}_WRITE_ATOMIC_DIO filemap: Change mapping_set_folio_min_order() -> mapping_set_folio_orders() mm: Add PG_atomic fs: Add initial buffered atomic write support info to statx fs: iomap: buffered atomic write support fs: xfs: buffered atomic writes statx support fs: xfs: Enable buffered atomic writes block/bdev.c | 9 +++--- fs/iomap/buffered-io.c | 53 +++++++++++++++++++++++++++++----- fs/iomap/trace.h | 3 +- fs/stat.c | 26 ++++++++++++----- fs/xfs/libxfs/xfs_inode_buf.c | 8 +++++ fs/xfs/xfs_file.c | 12 ++++++-- fs/xfs/xfs_icache.c | 10 ++++--- fs/xfs/xfs_ioctl.c | 3 ++ fs/xfs/xfs_iops.c | 11 +++++-- include/linux/fs.h | 3 +- include/linux/iomap.h | 1 + include/linux/page-flags.h | 5 ++++ include/linux/pagemap.h | 20 ++++++++----- include/trace/events/mmflags.h | 3 +- include/uapi/linux/stat.h | 6 ++-- mm/filemap.c | 8 ++++- 16 files changed, 141 insertions(+), 40 deletions(-) -- 2.31.1