I get the exact opposite to the same error message "currently failed to authpin local pins". Had a few clients on ceph-fuse 12.2.2 and they ran into those issues a lot (evicting works). Upgrading to ceph-fuse 12.2.5 fixed it. The main cluster is on 12.2.4.
The cause is user's HPC jobs or even just their login on multiple nodes accessing the same files, in a particular way. Doesn't happen to other users. Haven't quite dug into it deep enough as upgrading to 12.2.5 fixed our problem. From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>
Sent: Tuesday, 29 May 2018 7:29:06 AM To: Paul Emmerich Cc: Ceph Users; Peter Wienemann Subject: Re: Ceph-fuse getting stuck with "currently failed to authpin local pins" Dear Paul,
Am 28.05.2018 um 20:16 schrieb Paul Emmerich: > I encountered the exact same issue earlier today immediately after upgrading a customer's cluster from 12.2.2 to 12.2.5. > I've evicted the session and restarted the ganesha client to fix it, as I also couldn't find any obvious problem. interesting! In our case, the client with the problem (it happened again a few hours later...) always was a ceph-fuse client. Evicting / rebooting the client node helped. However, it may well be that the original issue way caused by a Ganesha client, which we also use (and the user in question who complained was accessing files in parallel via NFS and ceph-fuse), but I don't have a clear indication of that. Cheers, Oliver > > Paul > > 2018-05-28 16:38 GMT+02:00 Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx>>: > > Dear Cephalopodians, > > we just had a "lockup" of many MDS requests, and also trimming fell behind, for over 2 days. > One of the clients (all ceph-fuse 12.2.5 on CentOS 7.5) was in status "currently failed to authpin local pins". Metadata pool usage did grow by 10 GB in those 2 days. > > Rebooting the node to force a client eviction solved the issue, and now metadata usage is down again, and all stuck requests were processed quickly. > > Is there any idea on what could cause something like that? On the client, der was no CPU load, but many processes waiting for cephfs to respond. > Syslog did yield anything. It only affected one user and his user directory. > > If there are no ideas: How can I collect good debug information in case this happens again? > > Cheers, > Oliver > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io <http://www.croit.io> > Tel: +49 89 1896585 90 |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com