Re: Client failing to respond to capability release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugen, thanks for that :D

This time it was something different. Possibly a bug in the kclient. On these nodes I found sync commands stuck in D-state. I guess a file/dir was not possible to sync or there was some kind of corruption of buffer data. We had to reboot the servers to clear that out.

On first inspection these clients looked OK. Only some deeper debugging revealed that something was off.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Wednesday, August 23, 2023 8:55 AM
To: ceph-users@xxxxxxx
Subject:  Re: Client failing to respond to capability release

Hi,

pointing you to your own thread [1] ;-)

[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/HFILR5NMUCEZH7TJSGSACPI4P23XTULI/

Zitat von Frank Schilder <frans@xxxxxx>:

> Hi all,
>
> I have this warning the whole day already (octopus latest cluster):
>
> HEALTH_WARN 4 clients failing to respond to capability release; 1
> pgs not deep-scrubbed in time
> [WRN] MDS_CLIENT_LATE_RELEASE: 4 clients failing to respond to
> capability release
>     mds.ceph-24(mds.1): Client sn352.hpc.ait.dtu.dk:con-fs2-hpc
> failing to respond to capability release client_id: 145698301
>     mds.ceph-24(mds.1): Client sn463.hpc.ait.dtu.dk:con-fs2-hpc
> failing to respond to capability release client_id: 189511877
>     mds.ceph-24(mds.1): Client sn350.hpc.ait.dtu.dk:con-fs2-hpc
> failing to respond to capability release client_id: 189511887
>     mds.ceph-24(mds.1): Client sn403.hpc.ait.dtu.dk:con-fs2-hpc
> failing to respond to capability release client_id: 231250695
>
> If I look at the session info from mds.1 for these clients I see this:
>
> # ceph tell mds.1 session ls | jq -c '[.[] | {id: .id, h:
> .client_metadata.hostname, addr: .inst, fs: .client_metadata.root,
> caps: .num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | grep
> -e 145698301 -e 189511877 -e 189511887 -e 231250695
> {"id":189511887,"h":"sn350.hpc.ait.dtu.dk","addr":"client.189511887
> v1:192.168.57.221:0/4262844211","fs":"/hpc/groups","caps":2,"req":0}
> {"id":231250695,"h":"sn403.hpc.ait.dtu.dk","addr":"client.231250695
> v1:192.168.58.18:0/1334540218","fs":"/hpc/groups","caps":3,"req":0}
> {"id":189511877,"h":"sn463.hpc.ait.dtu.dk","addr":"client.189511877
> v1:192.168.58.78:0/3535879569","fs":"/hpc/groups","caps":4,"req":0}
> {"id":145698301,"h":"sn352.hpc.ait.dtu.dk","addr":"client.145698301
> v1:192.168.57.223:0/2146607320","fs":"/hpc/groups","caps":7,"req":0}
>
> We have mds_min_caps_per_client=4096, so it looks like the limit is
> well satisfied. Also, the file system is pretty idle at the moment.
>
> Why and what exactly is the MDS complaining about here?
>
> Thanks and best regards.
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux