Re: CephFS kclient gets stuck when getattr() on a certain file

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 04 May 2021 08:02:29 -0400

IIUC, when a client reboots and mounts again, it becomes a new client,
for all intents and purposes. So if the MDS is still maintaining the
session from the old (pre-reboot) client, the new client will generally
need to wait until that session is evicted before it can grab any caps
that that client previously held. This was one of the reasons we added
some of the reboot recovery stuff into libcephfs to support the nfs-
ganesha client use-case.

Assuming that's the case here, we might be able to eventually improve
that by having kclients set their identity on the session at mount time
(a'la ceph_set_uuid), and then it could tell the MDS that it was safe to
release the state that that client previously held. That would mean
generating a unique per-client ID that was invariant across reboots, but
we could consider it.

-- Jeff

On Mon, 2021-05-03 at 14:20 -0700, Gregory Farnum wrote:
> I haven't looked at the logs, but it's expected that when a client
> disappears and it's holding caps, the MDS will wait through the
> session timeout period before revoking those capabilities. This means
> if all your clients are reading the file, writes will be blocked until
> the session timeout passes. The details of exactly what operations
> will be allowed vary quite a lot depending on the exact system state
> when the client disappeared (if it held write caps, most read
> operations will also be blocked and new clients trying to look at it
> will certainly be blocked).
> 
> I don't remember exactly how specific kernel client blocklists are,
> but there may be something going on there that makes things extra hard
> on the rebooted node if it's maintaining the same IP addresses.
> 
> If you have other monitoring software to detect failures, there are
> ways to evict clients before the session timeout passes (or you could
> have the rebooted node do so) and these are discussed in the docs.
> -Greg
> 
> On Tue, Apr 27, 2021 at 9:35 PM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
> > 
> > Hi,
> > 
> > I deploy a 3-node Ceph cluster (v15.2.9) and the CephFS is mounted via
> > kclient (linux-4.14.24) on all of the 3 nodes.  All of the kclients
> > try to update (read/write) a certain file periodically in order to
> > know whether the CephFS is alive or not.  After a kclient gets evicted
> > due to abnormal reboot, a new kclient mounts to the CephFS when the
> > node comes back.  However, the newly mounted kclient gets stuck when
> > it tries to getattr on the common file.  Under such conditions, all of
> > the other kclients are affected and they cannot update the common
> > file, too.  From the debugfs entris, a request does get stuck:
> > ------
> > [/sys/kernel/debug/ceph/1bbb7753-85e5-4d33-a860-84419fdcfd7d.client3230166]
> > # cat mdsc
> > 12      mds0    getattr  #100000003ed
> > 
> > [/sys/kernel/debug/ceph/1bbb7753-85e5-4d33-a860-84419fdcfd7d.client3230166]
> > # cat osdc
> > REQUESTS 0 homeless 0
> > LINGER REQUESTS
> > BACKOFFS
> > 
> > [/sys/kernel/debug/ceph/1bbb7753-85e5-4d33-a860-84419fdcfd7d.client3230166]
> > # ceph -s
> >   cluster:
> >     id:     1bbb7753-85e5-4d33-a860-84419fdcfd7d
> >     health: HEALTH_WARN
> >             1 MDSs report slow requests
> > 
> >   services:
> >     mon: 3 daemons, quorum Jerry-ceph-n2,Jerry-x85-n1,Jerry-x85-n3 (age 23h)
> >     mgr: Jerry-x85-n1(active, since 25h), standbys: Jerry-ceph-n2, Jerry-x85-n3
> >     mds: cephfs:1 {0=qceph-mds-Jerry-ceph-n2=up:active} 1
> > up:standby-replay 1 up:standby
> >     osd: 18 osds: 18 up (since 23h), 18 in (since 23h)
> > ------
> > 
> > The MDS logs (debug_mds =20) are provided:
> > https://drive.google.com/file/d/1aj101NOTzCsfDdC-neqVTvKpEPOd3M6Q/view?usp=sharing
> > 
> > Some of the logs wrt client.3230166 and ino#100000003ed are shown as below:
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700  4 mds.0.server
> > handle_client_request client_request(client.3230166:12 getattr
> > pAsLsXsFs #0x100000003ed 2021-04-27T11:57:03.469426+0800 caller_uid=0,
> > caller_gid=0{}) v2
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 20 mds.0.98 get_session have
> > 0x56130c5ce480 client.3230166 v1:192.168.92.89:0/679429733 state open
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 15 mds.0.server  oldest_client_tid=12
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700  7 mds.0.cache request_start
> > request(client.3230166:12 nref=2 cr=0x56130db96480)
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700  7 mds.0.server
> > dispatch_client_request client_request(client.3230166:12 getattr
> > pAsLsXsFs #0x100000003ed 2021-04-27T11:57:03.469426+0800 caller_uid=0,
> > caller_gid=0{}) v2
> > 
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10 mds.0.locker
> > acquire_locks request(client.3230166:12 nref=3 cr=0x56130db96480)
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10
> > mds.0.cache.ino(0x100000003ed) auth_pin by 0x56130c584ed0 on [inode
> > 0x100000003ed [2,head]
> > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > w=1) (iversion lock)
> > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@2}
> > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=0 authpin=1
> > 0x56130c584a00] now 4
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 15
> > mds.0.cache.dir(0x100000003eb) adjust_nested_auth_pins 1 on [dir
> > 0x100000003eb /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/
> > [2,head] auth pv=91586 v=91584 cv=0/0 ap=1+4 state=1610874881|complete
> > f(v0 m2021-04-23T15:00:04.377198+0800 6=6+0) n(v3
> > rc2021-04-27T11:57:02.625542+0800 b38005818 6=6+0) hs=6+0,ss=0+0
> > dirty=4 | child=1 dirty=1 waiter=0 authpin=1 0x56130c586a00] by
> > 0x56130c584a00 count now 1/4
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10 mds.0
> > RecoveryQueue::prioritize not queued [inode 0x100000003ed [2,head]
> > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > w=1) (iversion lock)
> > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@2}
> > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=0 authpin=1
> > 0x56130c584a00]
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700  7 mds.0.locker rdlock_start
> > waiting on (ifile mix->sync w=1) on [inode 0x100000003ed [2,head]
> > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > w=1) (iversion lock)
> > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@2}
> > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=0 authpin=1
> > 0x56130c584a00]
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 10
> > mds.0.cache.ino(0x100000003ed) add_waiter tag 2000000040000000
> > 0x56130ea1bbe0 !ambig 1 !frozen 1 !freezing 1
> > 2021-04-27T11:57:03.467+0800 7fccbd3be700 15
> > mds.0.cache.ino(0x100000003ed) taking waiter here
> > 2021-04-27T11:57:03.468+0800 7fccbd3be700 20 mds.0.locker
> > client.3230166 pending pAsLsXsFr allowed pAsLsXsFrl wanted
> > pAsxXsxFsxcrwb
> > 2021-04-27T11:57:03.468+0800 7fccbd3be700  7 mds.0.locker
> > handle_client_caps  on 0x100000003ed tid 0 follows 0 op update flags
> > 0x2
> > 2021-04-27T11:57:03.468+0800 7fccbd3be700 20 mds.0.98 get_session have
> > 0x56130b81f600 client.3198501 v1:192.168.50.108:0/2478094748 state
> > open
> > 2021-04-27T11:57:03.468+0800 7fccbd3be700 10 mds.0.locker  head inode
> > [inode 0x100000003ed [2,head]
> > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > w=1) (iversion lock)
> > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > caps={3198501=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsxXsxFsxcrwb@2}
> > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=1 authpin=1
> > 0x56130c584a00]
> > 2021-04-27T11:57:03.468+0800 7fccbd3be700 10 mds.0.locker  follows 0
> > retains pAsLsXsFr dirty - on [inode 0x100000003ed [2,head]
> > /QTS/VOL_1/.ovirt_data_domain/37065419-e7f3-47ca-97df-8af0c67d30a0/dom_md/ids
> > auth v91583 pv91585 ap=4 recovering s=1048576 n(v0
> > rc2021-04-27T11:57:02.625542+0800 b1048576 1=1+0) (ifile mix->sync
> > w=1) (iversion lock)
> > cr={3198501=0-4194304@1,3198504=0-4194304@1,3221169=0-4194304@1}
> > caps={3198501=pAsLsXsFr/pAsxXsxFsxcrwb@17,3198504=pAsLsXsFr/pAsxXsxFsxcrwb@9,3230166=pAsLsXsFr/pAsxXsxFsxcrwb@2}
> > > ptrwaiter=1 request=1 lock=2 caps=1 dirty=1 waiter=1 authpin=1
> > 0x56130c584a00]
> > 2021-04-27T11:57:37.027+0800 7fccbb3ba700  0 log_channel(cluster) log
> > [WRN] : slow request 33.561029 seconds old, received at
> > 2021-04-27T11:57:03.467164+0800: client_request(client.3230166:12
> > getattr pAsLsXsFs #0x100000003ed 2021-04-27T11:57:03.469426+0800
> > caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting
> > 
> > Any idea or insight to help to further investigate the issue are appreciated.
> > 
> > - Jerry
> > 
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>