Re: CephFS kclient gets stuck when getattr() on a certain file

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 12 May 2021 22:42:21 +0800

On Fri, May 7, 2021 at 1:03 PM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
>
> Hi,
>
> First, thanks for all the great information and directions!  Since the
> issue can be steadily re-produced in my environment, and it doesn't
> occur before I upgraded ceph from v15.2.4 to v15.2.9.  I checked the
> changlogs between v15.2.4 and v15.2.9, and reverted some possible PR
> related to CephFS and mds.  Luckily, after reverting all the commits
> of the "mds: reduce memory usage of open file table prefetch #37382
> (pr#37383), https://github.com/ceph/ceph/pull/37383"; RP, the issue
> never occurs.
>

should be fixed by following patch

diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index c534fae1b4..c98cb3ea77 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -5469,6 +5469,7 @@ void Locker::file_eval(ScatterLock *lock, bool
*need_issue)
   }
   else if (in->state_test(CInode::STATE_NEEDSRECOVER)) {
     mds->mdcache->queue_file_recover(in);
+    mds->mdcache->do_file_recover();
   }
 }



> However, those commits are kind of complicated and I'm still looking
> into it in order to figure out the root cause.  If there is anything I
> can do to locate the bug, please let me know, thanks!
>
> - Jerry
>
> On Tue, 4 May 2021 at 20:02, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > IIUC, when a client reboots and mounts again, it becomes a new client,
> > for all intents and purposes. So if the MDS is still maintaining the
> > session from the old (pre-reboot) client, the new client will generally
> > need to wait until that session is evicted before it can grab any caps
> > that that client previously held. This was one of the reasons we added
> > some of the reboot recovery stuff into libcephfs to support the nfs-
> > ganesha client use-case.
> >
> > Assuming that's the case here, we might be able to eventually improve
> > that by having kclients set their identity on the session at mount time
> > (a'la ceph_set_uuid), and then it could tell the MDS that it was safe to
> > release the state that that client previously held. That would mean
> > generating a unique per-client ID that was invariant across reboots, but
> > we could consider it.
> >
> > -- Jeff
> >
> > On Mon, 2021-05-03 at 14:20 -0700, Gregory Farnum wrote:
> > > I haven't looked at the logs, but it's expected that when a client
> > > disappears and it's holding caps, the MDS will wait through the
> > > session timeout period before revoking those capabilities. This means
> > > if all your clients are reading the file, writes will be blocked until
> > > the session timeout passes. The details of exactly what operations
> > > will be allowed vary quite a lot depending on the exact system state
> > > when the client disappeared (if it held write caps, most read
> > > operations will also be blocked and new clients trying to look at it
> > > will certainly be blocked).
> > >
> > > I don't remember exactly how specific kernel client blocklists are,
> > > but there may be something going on there that makes things extra hard
> > > on the rebooted node if it's maintaining the same IP addresses.
> > >
> > > If you have other monitoring software to detect failures, there are
> > > ways to evict clients before the session timeout passes (or you could
> > > have the rebooted node do so) and these are discussed in the docs.
> > > -Greg
> > >
> > > On Tue, Apr 27, 2021 at 9:35 PM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I deploy a 3-node Ceph cluster (v15.2.9) and the CephFS is mounted via
> > > > kclient (linux-4.14.24) on all of the 3 nodes.  All of the kclients
> > > > try to update (read/write) a certain file periodically in order to
> > > > know whether the CephFS is alive or not.  After a kclient gets evicted
> > > > due to abnormal reboot, a new kclient mounts to the CephFS when the
> > > > node comes back.  However, the newly mounted kclient gets stuck when
> > > > it tries to getattr on the common file.  Under such conditions, all of
> > > > the other kclients are affected and they cannot update the common
> > > > file, too.  From the debugfs entris, a request does get stuck:
> > > > ------
> > > > [/sys/kernel/debug/ceph/1bbb7753-85e5-4d33-a860-84419fdcfd7d.client3230166]
> > > > # cat mdsc
> > > > 12      mds0    getattr  #100000003ed
> > > >
> > > > [/sys/kernel/debug/ceph/1bbb7753-85e5-4d33-a860-84419fdcfd7d.client3230166]
> > > > # cat osdc
> > > > REQUESTS 0 homeless 0
> > > > LINGER REQUESTS
> > > > BACKOFFS
> > > >
> > > > [/sys/kernel/debug/ceph/1bbb7753-85e5-4d33-a860-84419fdcfd7d.client3230166]
> > > > # ceph -s
> > > >   cluster:
> > > >     id:     1bbb7753-85e5-4d33-a860-84419fdcfd7d
> > > >     health: HEALTH_WARN
> > > >             1 MDSs report slow requests
> > > >
> > > >   services:
> > > >     mon: 3 daemons, quorum Jerry-ceph-n2,Jerry-x85-n1,Jerry-x85-n3 (age 23h)
> > > >     mgr: Jerry-x85-n1(active, since 25h), standbys: Jerry-ceph-n2, Jerry-x85-n3
> > > >     mds: cephfs:1 {0=qceph-mds-Jerry-ceph-n2=up:active} 1
> > > > up:standby-replay 1 up:standby
> > > >     osd: 18 osds: 18 up (since 23h), 18 in (since 23h)
> > > > ------
> > > >
> > > > The MDS logs (debug_mds =20) are provided:
> > > > https://drive.google.com/file/d/1aj101NOTzCsfDdC-neqVTvKpEPOd3M6Q/view?usp=sharing
> > > >
> > > > Some of the logs wrt client.3230166 and ino#100000003ed are shown as below:
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700  4 mds.0.server
> > > > handle_client_request client_request(client.3230166:12 getattr
> > > > pAsLsXsFs #0x100000003ed 2021-04-27T11:57:03.469426+0800 caller_uid=0,
> > > > caller_gid=0{}) v2
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 20 mds.0.98 get_session have
> > > > 0x56130c5ce480 client.3230166 v1:192.168.92.89:0/679429733 state open
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 15 mds.0.server  oldest_client_tid=12
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700  7 mds.0.cache request_start
> > > > request(client.3230166:12 nref=2 cr=0x56130db96480)
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700  7 mds.0.server
> > > > dispatch_client_request client_request(client.3230166:12 getattr
> > > > pAsLsXsFs #0x100000003ed 2021-04-27T11:57:03.469426+0800 caller_uid=0,
> > > > caller_gid=0{}) v2
> > > >
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10 mds.0.locker
> > > > acquire_locks request(client.3230166:12 nref=3 cr=0x56130db96480)
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10
> > > > mds.0.cache.ino(0x100000003ed) auth_pin by 0x56130c584ed0 on [inode
> > > > 0x100000003ed [2,head]
> > > > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > > > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > > > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > > > w=1) (iversion lock)
> > > > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > > > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@2}
> > > > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=0 authpin=1
> > > > 0x56130c584a00] now 4
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 15
> > > > mds.0.cache.dir(0x100000003eb) adjust_nested_auth_pins 1 on [dir
> > > > 0x100000003eb /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/
> > > > [2,head] auth pv=91586 v=91584 cv=0/0 ap=1+4 state=1610874881|complete
> > > > f(v0 m2021-04-23T15:00:04.377198+0800 6=6+0) n(v3
> > > > rc2021-04-27T11:57:02.625542+0800 b38005818 6=6+0) hs=6+0,ss=0+0
> > > > dirty=4 | child=1 dirty=1 waiter=0 authpin=1 0x56130c586a00] by
> > > > 0x56130c584a00 count now 1/4
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10 mds.0
> > > > RecoveryQueue::prioritize not queued [inode 0x100000003ed [2,head]
> > > > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > > > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > > > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > > > w=1) (iversion lock)
> > > > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > > > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@2}
> > > > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=0 authpin=1
> > > > 0x56130c584a00]
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700  7 mds.0.locker rdlock_start
> > > > waiting on (ifile mix->sync w=1) on [inode 0x100000003ed [2,head]
> > > > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > > > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > > > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > > > w=1) (iversion lock)
> > > > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > > > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@2}
> > > > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=0 authpin=1
> > > > 0x56130c584a00]
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10
> > > > mds.0.cache.ino(0x100000003ed) add_waiter tag 2000000040000000
> > > > 0x56130ea1bbe0 !ambig 1 !frozen 1 !freezing 1
> > > > 2021-04-27T11:57:03.467+0800 7fccbd3be700 15
> > > > mds.0.cache.ino(0x100000003ed) taking waiter here
> > > > 2021-04-27T11:57:03.468+0800 7fccbd3be700 20 mds.0.locker
> > > > client.3230166 pending pAsLsXsFr allowed pAsLsXsFrl wanted
> > > > pAsxXsxFsxcrwb
> > > > 2021-04-27T11:57:03.468+0800 7fccbd3be700  7 mds.0.locker
> > > > handle_client_caps  on 0x100000003ed tid 0 follows 0 op update flags
> > > > 0x2
> > > > 2021-04-27T11:57:03.468+0800 7fccbd3be700 20 mds.0.98 get_session have
> > > > 0x56130b81f600 client.3198501 v1:192.168.50.108:0/2478094748 state
> > > > open
> > > > 2021-04-27T11:57:03.468+0800 7fccbd3be700 10 mds.0.locker  head inode
> > > > [inode 0x100000003ed [2,head]
> > > > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > > > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > > > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > > > w=1) (iversion lock)
> > > > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > > > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsxXsxFsxcrwb@2}
> > > > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=1 authpin=1
> > > > 0x56130c584a00]
> > > > 2021-04-27T11:57:03.468+0800 7fccbd3be700 10 mds.0.locker  follows 0
> > > > retains pAsLsXsFr dirty - on [inode 0x100000003ed [2,head]
> > > > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > > > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > > > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > > > w=1) (iversion lock)
> > > > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > > > caps={3198501=pAsLsXsFr/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsxXsxFsxcrwb@2}
> > > > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=1 authpin=1
> > > > 0x56130c584a00]
> > > > 2021-04-27T11:57:37.027+0800 7fccbb3ba700  0 log_channel(cluster) log
> > > > [WRN] : slow request 33.561029 seconds old, received at
> > > > 2021-04-27T11:57:03.467164+0800: client_request(client.3230166:12
> > > > getattr pAsLsXsFs #0x100000003ed 2021-04-27T11:57:03.469426+0800
> > > > caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting
> > > >
> > > > Any idea or insight to help to further investigate the issue are appreciated.
> > > >
> > > > - Jerry
> > > >
> > >
> >
> > --
> > Jeff Layton <jlayton@xxxxxxxxxx>
> >