Re: Clients failing to respond to capability release

Tim Bishop <tim-lists@xxxxxxxxxxx> · Wed, 20 Sep 2023 14:25:20 +0100

Hi Stefan,

On Wed, Sep 20, 2023 at 11:00:12AM +0200, Stefan Kooman wrote:
> On 19-09-2023 13:35, Tim Bishop wrote:
> > The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all
> > clients are working fine, with the exception of our backup server. This
> > is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1]
> > (so I suspect a newer Ceph version?).
> > 
> > The backup server has multiple (12) CephFS mount points. One of them,
> > the busiest, regularly causes this error on the cluster:
> > 
> > HEALTH_WARN 1 clients failing to respond to capability release
> > [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
> >      mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing to respond to capability release client_id: 521306112
> > 
> > And occasionally, which may be unrelated, but occurs at the same time:
> > 
> > [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
> >      mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs
> > 
> > The second one clears itself, but the first sticks until I can unmount
> > the filesystem on the client after the backup completes.
> 
> You are not alone. We also have a backup server running 22.04 and 6.2 and
> occasionally hit this issue. We hit this with mainly 5.12.19 clients and a
> 6.2 backup server. We're on 16.2.11.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Sidenote:
> 
> For those of you who are wondering: why would you want to use latest
> (greatest?) linux kernel for CephFS ... this is why. To try to get rid of 1)
> slow requests because of some deadlock / locking issue, clients failing to
> capability release, and 3) bug fixes / improvements (thx devs!).
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Questions:
> 
> Do you have the filesystem read only mounted and given the backup server
> CephFS client read only caps on the MDS?

Yes, mounted read-only and the caps for the client are read-only for the
MDS.

I do have multiple mounts from the same CephFS filesystem though, and
I've been wondering if that could be causing more parallel requests from
the backup server. I'd been thinking about doing it through a single
mount, but then all the paths change which doesn't make the backups
overly happy.

> Are you running a multiple active MDS setup?

No. We tried it for a while but after seeing some issues like this we
backtracked to a single active MDS to rule out multiple active being the
issue.

> > It appears that whilst it's in this stuck state there may be one or more
> > directory trees that are inaccessible to all clients. The backup server
> > is walking the whole tree but never gets stuck itself, so either the
> > inaccessible directory entry is caused after it has gone past, or it's
> > not affected. Maybe the backup server is holding a directory when it
> > shouldn't?
> 
> We have seen both cases, yet most of the time the backup server would not be
> able to make progress and be stuck on a file.

Interesting. Backups have never got stuck for us. Whilst we regularly,
pretty much daily, see the above mentioned error.

But because nothing we're directly running gets stuck I only find out if
a directory somewhere is inaccessible if a user reports it to us from
one of our other client machines, usually a HPC node.

> > It may be that an upgrade to Quincy resolves this, since it's more
> > likely to be inline with the kernel client version wise, but I don't
> > want to knee-jerk upgrade just to try and fix this problem.
> 
> We are testing with 6.5 kernel clients (see other recent threads about
> this). We have not seen this issue there (but time will tell, it does not
> happen *that* often, but hit other issues).
> 
> The MDS server itself is indeed older than the newer kernel clients. It
> might certainly be a factor. And that raises the question what kind of
> interoperability / compatibility tests (if any) are done between CephFS
> (kernel) clients and MDS server versions. This might be a good "focus topic"
> for a ceph User + Dev meeting ...
> 
> > Thanks for any advice.
> 
> You might want to try 6.5.x kernel on the clients. But might run into other
> issues. Not sure about that, these might be only relevant for one of our
> workloads, only one way to find out ...

I've been sticking with what's available in Ubuntu - the 6.2 kernel is
part of their HWE enablement stack, which is handy. It won't be long
until 23.10 is out with the 6.5 kernel though. I'll definitely give it a
try then.

> Enable debug logging on the MDS to gather logs that might shine some light
> on what is happening with that request.
> 
> ceph daemon mds.name dump_ops_in_flight might help here to get client id and
> request.

I've done both of these in the past, but I should look again (of course,
it's not broken right now!). From what I recall there was nothing
unusual looking about the request, and certainly nothing that Googling
and searching list archives and bug reports led me to anything useful.

> Another thing that you might do is to dump the cache on the MDS to gather
> more info. This however is highly dependent on the amount of RAM the MDS is
> using. In the past we would kill the MDS (unresponsive, replaced by
> standby-replay). Improvements to prevent that have been made ... but we have
> not tried after that. See this thread [1]. What MDS_MEMORY_TARGET have you
> set? Make sure you have enough disk space to store the dump file. To
> actually make sense of that dump file / debug logging you should understand
> _exactly_ how the CAPS mechanism works, and see if it is violated somewhere
> ... and then look in the code to see why. Short of that knowledge, the
> CephFS developers might help out.

I did try that before too, but ran out of disk space. Current MDS memory
usage is around 16GB, with a cache limit set of 8GB. This is likely
straying beyond my understand of Ceph though.

Thanks for your advice. Looks like giving a newer kernel a try is
something to consider. Also we'll need to be looking at Quincy soon
anyway, so that might mix things up a bit too. It's just about
manageable at the moment, but just needs more hand-holding than I'd
really like to be giving.

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx