Re: [PATCH] 9p: Fix DIO read through netfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I did some additional testing of this patch after seeing regressions
starting with 6.11-rc4 with xfstests generic/125 and generic/210 and
was able to bisect it to this patch which just went in:

commit e3786b29c54cdae3490b07180a54e2461f42144c
Author: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>

diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..3f3842e7b44a 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct
netfs_io_subrequest *subreq)
                        goto out;
        }

-       __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+       if (subreq->rreq->origin != NETFS_DIO_READ)
+               __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);

        rc = rdata->server->ops->async_readv(rdata);
 out:

It is very simple to repro.
I originally noticed it with tests to shares in Azure, but it also
fails locally to Samba with default cifs.ko mount options.


log message is below:

[ 7884.205037] Workqueue: cifsiod smb2_readv_worker [cifs]
[ 7884.205262] RIP: 0010:netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205299] Code: 01 00 00 e8 02 b4 07 df 4c 8b 4c 24 08 49 89 d8
4c 89 e9 41 8b b4 24 d4 01 00 00 44 89 f2 48 c7 c7 40 10 65 c1 e8 30
a9 b6 de <0f> 0b 48 8b 7c 24 18 4c 8d bd c0 00 00 00 e8 2d b5 07 df 48
8b 7c
[ 7884.205305] RSP: 0018:ff1100010705fce8 EFLAGS: 00010286
[ 7884.205312] RAX: dffffc0000000000 RBX: 0000000000001000 RCX: 0000000000000027
[ 7884.205317] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff110004cb1b1a08
[ 7884.205322] RBP: ff11000119450900 R08: ffffffffa03e346e R09: ffe21c0099636341
[ 7884.205326] R10: ff110004cb1b1a0b R11: 0000000000000001 R12: ff11000137b68a80
[ 7884.205330] R13: 000000000000012c R14: 0000000000000001 R15: ff11000126a96f78
[ 7884.205335] FS:  0000000000000000(0000) GS:ff110004cb180000(0000)
knlGS:0000000000000000
[ 7884.205339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7884.205344] CR2: 00007f0035f0a67c CR3: 000000000f664004 CR4: 0000000000371ef0
[ 7884.205354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7884.205359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7884.205363] Call Trace:
[ 7884.205367]  <TASK>
[ 7884.205373]  ? __warn+0xa4/0x220
[ 7884.205386]  ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205423]  ? report_bug+0x1d4/0x1e0
[ 7884.205436]  ? handle_bug+0x42/0x80
[ 7884.205442]  ? exc_invalid_op+0x18/0x50
[ 7884.205449]  ? asm_exc_invalid_op+0x1a/0x20
[ 7884.205464]  ? irq_work_claim+0x1e/0x40
[ 7884.205475]  ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205512]  ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205554]  process_one_work+0x4cf/0xb80
[ 7884.205573]  ? __pfx_lock_acquire+0x10/0x10
[ 7884.205582]  ? __pfx_process_one_work+0x10/0x10
[ 7884.205599]  ? assign_work+0xd6/0x110
[ 7884.205609]  worker_thread+0x2cd/0x550
[ 7884.205622]  ? __pfx_worker_thread+0x10/0x10
[ 7884.205632]  kthread+0x187/0x1d0
[ 7884.205639]  ? __pfx_kthread+0x10/0x10
[ 7884.205648]  ret_from_fork+0x34/0x60
[ 7884.205655]  ? __pfx_kthread+0x10/0x10
[ 7884.205661]  ret_from_fork_asm+0x1a/0x30
[ 7884.205684]  </TASK>
[ 7884.205688] irq event stamp: 23635
[ 7884.205692] hardirqs last  enabled at (23641): [<ffffffffa022b58b>]
console_unlock+0x15b/0x170
[ 7884.205699] hardirqs last disabled at (23646): [<ffffffffa022b570>]
console_unlock+0x140/0x170
[ 7884.205705] softirqs last  enabled at (23402): [<ffffffffa0131a6e>]
__irq_exit_rcu+0xfe/0x120
[ 7884.205712] softirqs last disabled at (23397): [<ffffffffa0131a6e>]
__irq_exit_rcu+0xfe/0x120
[ 7884.205718] ---[ end trace 0000000000000000 ]---

On Fri, Aug 9, 2024 at 10:43 PM Dominique Martinet
<asmadeus@xxxxxxxxxxxxx> wrote:
>
> David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100:
> > From: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
> >
> > 9p: Fix DIO read through netfs
>
> nitpick: now sure how that ended up here but this is duplicated with the
> subject (the commit message ends up with this line twice)
>
> > If a program is watching a file on a 9p mount, it won't see any change in
> > size if the file being exported by the server is changed directly in the
> > source filesystem, presumably because 9p doesn't have change notifications,
> > and because netfs skips the reads if the file is empty.
> >
> > Fix this by attempting to read the full size specified when a DIO read is
> > requested (such as when 9p is operating in unbuffered mode) and dealing
> > with a short read if the EOF was less than the expected read.
> >
> > To make this work, filesystems using netfslib must not set
> > NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
> > I don't want to mandatorily clear this flag in netfslib for DIO because,
> > say, ceph might make a read from an object that is not completely filled,
> > but does not reside at the end of file - and so we need to clear the
> > excess.
> >
> > This can be tested by watching an empty file over 9p within a VM (such as
> > in the ktest framework):
> >
> >         while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo
>
> (This is basically the same thing but if one wants to control the read
> timing for more precise/verbose debugging:
>   exec 3< /host/tmp/foo
>   read -u 3 content && echo $content
>   (repeat as appropriate)
>   exec 3>&-
> )
>
> > then writing something into the empty file.  The watcher should immediately
> > display the file content and break out of the loop.  Without this fix, it
> > remains in the loop indefinitely.
> >
> > Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
> > Written-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
>
> Thanks for adding extra comments & fixing other filesystems.
>
> I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so
> hopefully shouldn't have further side effects, this sounds good to me:
>
> Signed-off-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
>
> > Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
> > cc: Eric Van Hensbergen <ericvh@xxxxxxxxxx>
> > cc: Latchesar Ionkov <lucho@xxxxxxxxxx>
> > cc: Christian Schoenebeck <linux_oss@xxxxxxxxxxxxx>
> > cc: Marc Dionne <marc.dionne@xxxxxxxxxxxx>
> > cc: Ilya Dryomov <idryomov@xxxxxxxxx>
> > cc: Steve French <sfrench@xxxxxxxxx>
> > cc: Paulo Alcantara <pc@xxxxxxxxxxxxx>
> > cc: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > cc: v9fs@xxxxxxxxxxxxxxx
> > cc: linux-afs@xxxxxxxxxxxxxxxxxxx
> > cc: ceph-devel@xxxxxxxxxxxxxxx
> > cc: linux-cifs@xxxxxxxxxxxxxxx
> > cc: linux-nfs@xxxxxxxxxxxxxxx
> > cc: netfs@xxxxxxxxxxxxxxx
> > cc: linux-fsdevel@xxxxxxxxxxxxxxx
> > ---
> >  fs/9p/vfs_addr.c     |    3 ++-
> >  fs/afs/file.c        |    3 ++-
> >  fs/ceph/addr.c       |    6 ++++--
> >  fs/netfs/io.c        |   17 +++++++++++------
> >  fs/nfs/fscache.c     |    3 ++-
> >  fs/smb/client/file.c |    3 ++-
> >  6 files changed, 23 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> > index a97ceb105cd8..24fdc74caeba 100644
> > --- a/fs/9p/vfs_addr.c
> > +++ b/fs/9p/vfs_addr.c
> > @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
> >
> >       /* if we just extended the file size, any portion not in
> >        * cache won't be on server and is zeroes */
> > -     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +     if (subreq->rreq->origin != NETFS_DIO_READ)
> > +             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >
> >       netfs_subreq_terminated(subreq, err ?: total, false);
> >  }
> > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > index c3f0c45ae9a9..ec1be0091fdb 100644
> > --- a/fs/afs/file.c
> > +++ b/fs/afs/file.c
> > @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op)
> >
> >       req->error = error;
> >       if (subreq) {
> > -             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +             if (subreq->rreq->origin != NETFS_DIO_READ)
> > +                     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >               netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
> >               req->subreq = NULL;
> >       } else if (req->done) {
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index cc0a2240de98..c4744a02db75 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req)
> >       if (err >= 0) {
> >               if (sparse && err > 0)
> >                       err = ceph_sparse_ext_map_end(op);
> > -             if (err < subreq->len)
> > +             if (err < subreq->len &&
> > +                 subreq->rreq->origin != NETFS_DIO_READ)
> >                       __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >               if (IS_ENCRYPTED(inode) && err > 0) {
> >                       err = ceph_fscrypt_decrypt_extents(inode,
> > @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
> >       size_t len;
> >       int mode;
> >
> > -     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +     if (rreq->origin != NETFS_DIO_READ)
> > +             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >       __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
> >
> >       if (subreq->start >= inode->i_size)
> > diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> > index c179a1c73fa7..5367caf3fa28 100644
> > --- a/fs/netfs/io.c
> > +++ b/fs/netfs/io.c
> > @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
> >
> >       if (transferred_or_error == 0) {
> >               if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
> > -                     subreq->error = -ENODATA;
> > +                     if (rreq->origin != NETFS_DIO_READ)
> > +                             subreq->error = -ENODATA;
> >                       goto failed;
> >               }
> >       } else {
> > @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
> >                       }
> >                       if (subreq->len > ictx->zero_point - subreq->start)
> >                               subreq->len = ictx->zero_point - subreq->start;
> > +
> > +                     /* We limit buffered reads to the EOF, but let the
> > +                      * server deal with larger-than-EOF DIO/unbuffered
> > +                      * reads.
> > +                      */
> > +                     if (subreq->len > rreq->i_size - subreq->start)
> > +                             subreq->len = rreq->i_size - subreq->start;
> >               }
> > -             if (subreq->len > rreq->i_size - subreq->start)
> > -                     subreq->len = rreq->i_size - subreq->start;
> >               if (rreq->rsize && subreq->len > rreq->rsize)
> >                       subreq->len = rreq->rsize;
> >
> > @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
> >       do {
> >               _debug("submit %llx + %llx >= %llx",
> >                      rreq->start, rreq->submitted, rreq->i_size);
> > -             if (rreq->origin == NETFS_DIO_READ &&
> > -                 rreq->start + rreq->submitted >= rreq->i_size)
> > -                     break;
> >               if (!netfs_rreq_submit_slice(rreq, &io_iter))
> >                       break;
> > +             if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
> > +                     break;
> >               if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
> >                   test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
> >                       break;
> > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> > index bf29a65c5027..7a558dea75c4 100644
> > --- a/fs/nfs/fscache.c
> > +++ b/fs/nfs/fscache.c
> > @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
> >               return;
> >
> >       sreq = netfs->sreq;
> > -     if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
> > +     if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
> > +         sreq->rreq->origin != NETFS_DIO_READ)
> >               __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
> >
> >       if (hdr->error)
> > diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
> > index b2405dd4d4d4..3f3842e7b44a 100644
> > --- a/fs/smb/client/file.c
> > +++ b/fs/smb/client/file.c
> > @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
> >                       goto out;
> >       }
> >
> > -     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +     if (subreq->rreq->origin != NETFS_DIO_READ)
> > +             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >
> >       rc = rdata->server->ops->async_readv(rdata);
> >  out:
> >
>
> --
> Dominique Martinet | Asmadeus
>


-- 
Thanks,

Steve





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux