Same problem here with Ceph 17.2.6 on Ubuntu 22.04 and Clients Debian 11, Kernel 6.0.12-1~bpo11+1. We are still looking for a solution. At the time being we let restart the Orchestrator MDS daemons by removig/adding labels to the servers. We use multiple MDS and have many CPU cores and memory. The problem should not be due to a lack of resources. Am Di., 19. Sept. 2023 um 13:36 Uhr schrieb Tim Bishop < tim-lists@xxxxxxxxxxx>: > Hi, > > I've seen this issue mentioned in the past, but with older releases. So > I'm wondering if anybody has any pointers. > > The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all > clients are working fine, with the exception of our backup server. This > is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1] > (so I suspect a newer Ceph version?). > > The backup server has multiple (12) CephFS mount points. One of them, > the busiest, regularly causes this error on the cluster: > > HEALTH_WARN 1 clients failing to respond to capability release > [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability > release > mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing > to respond to capability release client_id: 521306112 > > And occasionally, which may be unrelated, but occurs at the same time: > > [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests > mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs > > The second one clears itself, but the first sticks until I can unmount > the filesystem on the client after the backup completes. > > It appears that whilst it's in this stuck state there may be one or more > directory trees that are inaccessible to all clients. The backup server > is walking the whole tree but never gets stuck itself, so either the > inaccessible directory entry is caused after it has gone past, or it's > not affected. Maybe the backup server is holding a directory when it > shouldn't? > > It may be that an upgrade to Quincy resolves this, since it's more > likely to be inline with the kernel client version wise, but I don't > want to knee-jerk upgrade just to try and fix this problem. > > Thanks for any advice. > > Tim. > > [1] The reason for the newer kernel is that the backup performance from > CephFS was terrible with older kernels. This newer kernel does at least > resolve that issue. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx