On Thu, Feb 27, 2025 at 06:08:06PM +0000, John Garry wrote: > Currently atomic write support requires dedicated HW support. This imposes > a restriction on the filesystem that disk blocks need to be aligned and > contiguously mapped to FS blocks to issue atomic writes. > > XFS has no method to guarantee FS block alignment for regular, > non-RT files. As such, atomic writes are currently limited to 1x FS block > there. > > To deal with the scenario that we are issuing an atomic write over > misaligned or discontiguous data blocks - and raise the atomic write size > limit - support a SW-based software emulated atomic write mode. For XFS, > this SW-based atomic writes would use CoW support to issue emulated untorn > writes. > > It is the responsibility of the FS to detect discontiguous atomic writes > and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed, > SW-based atomic writes could be used always when the mounted bdev does > not support HW offload, but this strategy is not initially expected to be > used. > > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx> Looks good now, thank you. Reviewed-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> --D > --- > Documentation/filesystems/iomap/operations.rst | 16 ++++++++++++++-- > fs/iomap/direct-io.c | 4 +++- > include/linux/iomap.h | 6 ++++++ > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst > index 82bfe0e8c08e..b9757fe46641 100644 > --- a/Documentation/filesystems/iomap/operations.rst > +++ b/Documentation/filesystems/iomap/operations.rst > @@ -525,8 +525,20 @@ IOMAP_WRITE`` with any combination of the following enhancements: > conversion or copy on write), all updates for the entire file range > must be committed atomically as well. > Only one space mapping is allowed per untorn write. > - Untorn writes must be aligned to, and must not be longer than, a > - single file block. > + Untorn writes may be longer than a single file block. In all cases, > + the mapping start disk block must have at least the same alignment as > + the write offset. > + > + * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write > + protection via a software mechanism provided by the filesystem. > + All the disk block alignment and single bio restrictions which apply > + to IOMAP_ATOMIC_HW do not apply here. > + SW-based untorn writes would typically be used as a fallback when > + HW-based untorn writes may not be issued, e.g. the range of the write > + covers multiple extents, meaning that it is not possible to issue > + a single bio. > + All filesystem metadata updates for the entire file range must be > + committed atomically as well. > > Callers commonly hold ``i_rwsem`` in shared or exclusive mode before > calling this function. > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index f87c4277e738..575bb69db00e 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -644,7 +644,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > iomi.flags |= IOMAP_OVERWRITE_ONLY; > } > > - if (iocb->ki_flags & IOCB_ATOMIC) > + if (dio_flags & IOMAP_DIO_ATOMIC_SW) > + iomi.flags |= IOMAP_ATOMIC_SW; > + else if (iocb->ki_flags & IOCB_ATOMIC) > iomi.flags |= IOMAP_ATOMIC_HW; > > /* for data sync or sync, we need sync completion processing */ > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > index e7aa05503763..4fa716241c46 100644 > --- a/include/linux/iomap.h > +++ b/include/linux/iomap.h > @@ -183,6 +183,7 @@ struct iomap_folio_ops { > #define IOMAP_DAX 0 > #endif /* CONFIG_FS_DAX */ > #define IOMAP_ATOMIC_HW (1 << 9) /* HW-based torn-write protection */ > +#define IOMAP_ATOMIC_SW (1 << 10)/* SW-based torn-write protection */ > > struct iomap_ops { > /* > @@ -434,6 +435,11 @@ struct iomap_dio_ops { > */ > #define IOMAP_DIO_PARTIAL (1 << 2) > > +/* > + * Use software-based torn-write protection. > + */ > +#define IOMAP_DIO_ATOMIC_SW (1 << 3) > + > ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > const struct iomap_ops *ops, const struct iomap_dio_ops *dops, > unsigned int dio_flags, void *private, size_t done_before); > -- > 2.31.1 >