Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic

John Garry <john.g.garry@xxxxxxxxxx> · Thu, 7 Dec 2023 12:43:12 +0000

On 01/12/2023 22:07, Dave Chinner wrote:
RWF_ATOMIC is no different to RWF_NOWAIT. The API doesn't decide
what can be supported - the filesystems themselves decide what part
of the API they can support and implement those pieces.

TO go back to RWF_NOWAIT, for a long time we (XFS) only supported
RWF_NOWAIT on DIO, and buffered reads and writes were given
-EOPNOTSUPP by the filesystem. Then other filesystems started
supporting DIO with RWF_NOWAIT. Then buffered read support was added
to the page cache and XFS, and as other filesystems were converted
they removed the RWF_NOWAIT exclusion check from their read IO
paths.

We are now in the same place with buffered write support for
RWF_NOWAIT. XFS, the page cache and iomap allow buffered writes w/
RWF_NOWAIT, but ext4, btrfs and f2fs still all return -EOPNOTSUPP
because they don't support non-blocking buffered writes yet.

This is the same model we should be applying with RWF_ATOMIC - we
know that over time we'll be able to expand support for atomic
writes across both direct and buffered IO, so we should not be
restricting the API or infrastructure to only allow RWF_ATOMIC w/
DIO. Just have the filesystems reject RWF_ATOMIC w/ -EOPNOTSUPP if
they don't support it, and for those that do it is conditional on
whther the filesystem supports it for the given type of IO being
done.

Seriously - an application can easily probe for RWF_ATOMIC support
without needing information to be directly exposed in statx() - just
open a O_TMPFILE, issue the type of RWF_ATOMIC IO you require to be
supported, and if it returns -EOPNOTSUPP then it you can't use
RWF_ATOMIC optimisations in the application....

Hi Dave,

For rejecting RWF_ATOMIC when not supported for a file, how about 
something like this:

--->8----

diff --git a/block/fops.c b/block/fops.c
index 273bd8f5a370..d9563ef29dde 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -639,6 +637,9 @@ static int blkdev_open(struct inode *inode, struct 
file *filp)
 	if (IS_ERR(handle))
 		return PTR_ERR(handle);

+	if (queue_atomic_write_unit_max_bytes(bdev_get_queue(handle->bdev)))
+		filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
+
 	if (bdev_nowait(handle->bdev))
 		filp->f_mode |= FMODE_NOWAIT;

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4256ec184461..d725c194243c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -185,6 +185,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, 
loff_t offset,
 /* File supports async nowait buffered writes */
 #define FMODE_BUF_WASYNC	((__force fmode_t)0x80000000)

+/* File supports atomic writes */
+#define FMODE_CAN_ATOMIC_WRITE	((__force fmode_t)0x100000000)
+
 /*
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
@@ -3266,6 +3269,10 @@ static inline int kiocb_set_rw_flags(struct kiocb 
*ki, rwf_t flags)
 			return -EOPNOTSUPP;
 		kiocb_flags |= IOCB_NOIO;
 	}
+	if (flags & RWF_ATOMIC) {
+		if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE))
+			return -EOPNOTSUPP;
+	}
 	kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
 	if (flags & RWF_SYNC)
 		kiocb_flags |= IOCB_DSYNC;
diff --git a/include/linux/types.h b/include/linux/types.h
index 253168bb3fe1..49c754fde1d6 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -153,7 +153,7 @@ typedef u32 dma_addr_t;

 typedef unsigned int __bitwise gfp_t;
 typedef unsigned int __bitwise slab_flags_t;
-typedef unsigned int __bitwise fmode_t;
+typedef unsigned long __bitwise fmode_t;

 #ifdef CONFIG_PHYS_ADDR_T_64BIT
 typedef u64 phys_addr_t;


----8<------

My concern is that we need to increase fmode_t in size as all available 
32 bits are used up.

Thanks,
John