Re: Clients failing to respond to capability release

E Taka <0etaka0@xxxxxxxxx> · Mon, 2 Oct 2023 11:16:48 +0200

Same problem here with Ceph 17.2.6 on Ubuntu 22.04 and Clients Debian 11,
Kernel 6.0.12-1~bpo11+1.

We are still looking for a solution. At the time being we let restart the
Orchestrator MDS daemons by removig/adding labels to the servers. We use
multiple MDS and have many CPU cores and memory. The problem should not be
due to a lack of resources.

Am Di., 19. Sept. 2023 um 13:36 Uhr schrieb Tim Bishop <
tim-lists@xxxxxxxxxxx>:

> Hi,
>
> I've seen this issue mentioned in the past, but with older releases. So
> I'm wondering if anybody has any pointers.
>
> The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all
> clients are working fine, with the exception of our backup server. This
> is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1]
> (so I suspect a newer Ceph version?).
>
> The backup server has multiple (12) CephFS mount points. One of them,
> the busiest, regularly causes this error on the cluster:
>
> HEALTH_WARN 1 clients failing to respond to capability release
> [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability
> release
>     mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing
> to respond to capability release client_id: 521306112
>
> And occasionally, which may be unrelated, but occurs at the same time:
>
> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
>     mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs
>
> The second one clears itself, but the first sticks until I can unmount
> the filesystem on the client after the backup completes.
>
> It appears that whilst it's in this stuck state there may be one or more
> directory trees that are inaccessible to all clients. The backup server
> is walking the whole tree but never gets stuck itself, so either the
> inaccessible directory entry is caused after it has gone past, or it's
> not affected. Maybe the backup server is holding a directory when it
> shouldn't?
>
> It may be that an upgrade to Quincy resolves this, since it's more
> likely to be inline with the kernel client version wise, but I don't
> want to knee-jerk upgrade just to try and fix this problem.
>
> Thanks for any advice.
>
> Tim.
>
> [1] The reason for the newer kernel is that the backup performance from
> CephFS was terrible with older kernels. This newer kernel does at least
> resolve that issue.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx