Re: Client failing to respond to capability release

Frank Schilder <frans@xxxxxx> · Wed, 23 Aug 2023 07:09:33 +0000

Hi Dhairya,

this is the thing, the client appeared to be responsive and worked fine (file system was on-line and responsive as usual). There was something off though; see my response to Eugen.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dhairya Parmar <dparmar@xxxxxxxxxx>
Sent: Wednesday, August 23, 2023 9:05 AM
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Client failing to respond to capability release

Hi Frank,

This usually happens when the client is buggy/unresponsive. This warning is triggered when the client fails to respond to MDS's request to release caps in time which is determined by session_timeout(defaults to 60 secs). Did you make any config changes?

Dhairya Parmar

Associate Software Engineer, CephFS

Red Hat Inc.<https://www.redhat.com/>

dparmar@xxxxxxxxxx<mailto:dparmar@xxxxxxxxxx>

[https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]<https://www.redhat.com/>

On Tue, Aug 22, 2023 at 9:12 PM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
Hi all,

I have this warning the whole day already (octopus latest cluster):

HEALTH_WARN 4 clients failing to respond to capability release; 1 pgs not deep-scrubbed in time
[WRN] MDS_CLIENT_LATE_RELEASE: 4 clients failing to respond to capability release
    mds.ceph-24(mds.1): Client sn352.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 145698301
    mds.ceph-24(mds.1): Client sn463.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 189511877
    mds.ceph-24(mds.1): Client sn350.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 189511887
    mds.ceph-24(mds.1): Client sn403.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 231250695

If I look at the session info from mds.1 for these clients I see this:

# ceph tell mds.1 session ls | jq -c '[.[] | {id: .id, h: .client_metadata.hostname, addr: .inst, fs: .client_metadata.root, caps: .num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | grep -e 145698301 -e 189511877 -e 189511887 -e 231250695
{"id":189511887,"h":"sn350.hpc.ait.dtu.dk<http://sn350.hpc.ait.dtu.dk>","addr":"client.189511887 v1:192.168.57.221:0/4262844211<http://192.168.57.221:0/4262844211>","fs":"/hpc/groups","caps":2,"req":0}
{"id":231250695,"h":"sn403.hpc.ait.dtu.dk<http://sn403.hpc.ait.dtu.dk>","addr":"client.231250695 v1:192.168.58.18:0/1334540218<http://192.168.58.18:0/1334540218>","fs":"/hpc/groups","caps":3,"req":0}
{"id":189511877,"h":"sn463.hpc.ait.dtu.dk<http://sn463.hpc.ait.dtu.dk>","addr":"client.189511877 v1:192.168.58.78:0/3535879569<http://192.168.58.78:0/3535879569>","fs":"/hpc/groups","caps":4,"req":0}
{"id":145698301,"h":"sn352.hpc.ait.dtu.dk<http://sn352.hpc.ait.dtu.dk>","addr":"client.145698301 v1:192.168.57.223:0/2146607320<http://192.168.57.223:0/2146607320>","fs":"/hpc/groups","caps":7,"req":0}

We have mds_min_caps_per_client=4096, so it looks like the limit is well satisfied. Also, the file system is pretty idle at the moment.

Why and what exactly is the MDS complaining about here?

Thanks and best regards.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx