Re: [PATCH] netfs: Fix kernel async DIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David

As your patch was written on top on linux-next I was required to make some small modifications to make it work on mainline (6.13-rc6). The following patch is working fine for me on mainline, but i think it would be better to wait for your confirmation / validation (or new patch) before applying it on production.

#-------- PATCH --------#

diff --git a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c b/linux-6.13-rc6/fs/netfs/direct_write.c
index 88f2adf..94a1ee8 100644
--- a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c
+++ b/linux-6.13-rc6/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * * allocate a sufficiently large bvec array and may shorten the
                 * request.
                 */
-               if (async || user_backed_iter(iter)) {
+               if (user_backed_iter(iter)) {
n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
                        if (n < 0) {
                                ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
                        wreq->direct_bv_count = n;
wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
                } else {
+ /* If this is a kernel-generated async DIO request, + * assume that any resources the iterator points to + * (eg. a bio_vec array) will persist till the end of
+                        * the op.
+                        */
                        wreq->iter = *iter;
                }


#-------- TESTS --------#

Using this patch Linux 6.13-rc6 build with no error and '--direct-io=on' is working :


18:38:47 root@deb12-lab-10d:~# uname -a
Linux deb12-lab-10d.lab.lan 6.13.0-rc6-amd64 #0 SMP PREEMPT_DYNAMIC Mon Jan 6 18:14:07 CET 2025 x86_64 GNU/Linux

18:39:29 root@deb12-lab-10d:~# losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC /dev/loop2046 0 0 0 0 /mnt/FBX24T/FS-LAN/bckcrypt2046 1 4096

18:39:32 root@deb12-lab-10d:~# dmsetup ls | grep bckcrypt
bckcrypt    (254:7)

18:39:55 root@deb12-lab-10d:~# cryptsetup status bckcrypt
/dev/mapper/bckcrypt is active and is in use.
  type:    LUKS2
  cipher:  aes-xts-plain64
  keysize: 512 bits
  key location: keyring
  device:  /dev/loop2046
  loop:    /mnt/FBX24T/FS-LAN/bckcrypt2046
  sector size:  512
  offset:  32768 sectors
  size:    8589901824 sectors
  mode:    read/write

18:40:36 root@deb12-lab-10d:~# df -h | egrep 'cifs|bckcrypt'
//10.0.10.100/FBX24T      cifs        22T     13T  9,0T  60% /mnt/FBX24T
/dev/mapper/bckcrypt btrfs 4,0T 3,3T 779G 82% /mnt/bckcrypt


09:08:44 root@deb12-lab-10d:~# LANG=en_US.UTF-8
09:08:46 root@deb12-lab-10d:~# dd if=/dev/zero of=/mnt/bckcrypt/test/test.dd bs=256M count=16 oflag=direct status=progress
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14 s, 302 MB/s
16+0 records in
16+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14.2061 s, 302 MB/s



No write errors using '--direct-io=on' option of losetup with this patch => writing to the back-file is more than 20x faster ...
It seems to be ok !

Let me know if something's wrong in this patch or if it can safely be used in production.

Again thanks everyone for help.
Nicolas



Le 2025-01-06 13:07, nicolas.baranger@xxxxxx a écrit :

Hi David

Thanks for the job !
I will buid Linux 6.10 and mainline with the provided change and I'm comming here as soon as I get results from tests (CET working time).

Thanks again for help in this issue
Nicolas

Le 2025-01-06 12:37, David Howells a écrit :

Hi Nicolas,

Does the attached fix your problem?

David
---
netfs: Fix kernel async DIO

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that is supplied with a bio_vec[] array. Currently, because of the async flag, this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

mount //my/cifs/share /foo
dd if=/dev/zero of=/foo/m0 bs=4K count=1K
losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
echo hello >/dev/loop2046

This causes the following to appear in syslog:

WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that causes async kernel DIO writes to be handled as userspace writes. Note that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@xxxxxx>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@xxxxxx/
Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
cc: Steve French <smfrench@xxxxxxxxx>
cc: Jeff Layton <jlayton@xxxxxxxxxx>
cc: netfs@xxxxxxxxxxxxxxx
cc: linux-cifs@xxxxxxxxxxxxxxx
cc: linux-fsdevel@xxxxxxxxxxxxxxx
---
fs/netfs/direct_write.c |    7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index eded8afaa60b..42ce53cc216e 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
* allocate a sufficiently large bvec array and may shorten the
* request.
*/
-        if (async || user_backed_iter(iter)) {
+        if (user_backed_iter(iter)) {
n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
if (n < 0) {
ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
wreq->direct_bv_count = n;
wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
} else {
+            /* If this is a kernel-generated async DIO request,
+             * assume that any resources the iterator points to
+             * (eg. a bio_vec array) will persist till the end of
+             * the op.
+             */
wreq->buffer.iter = *iter;
}
}




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux