Re: Ceph-fuse getting stuck with "currently failed to authpin local pins"

Linh Vu <vul@xxxxxxxxxxxxxx> · Tue, 29 May 2018 01:26:29 +0000

I get the exact opposite to the same error message "currently failed to authpin local pins". Had a few clients on ceph-fuse 12.2.2 and they ran into those issues a lot (evicting works). Upgrading to ceph-fuse 12.2.5
 fixed it. The main cluster is on 12.2.4.

The cause is user's HPC jobs or even just their login on multiple nodes accessing the same files, in a particular way. Doesn't happen to other users. Haven't quite dug into it deep enough as upgrading to 12.2.5
 fixed our problem. 

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>

Sent: Tuesday, 29 May 2018 7:29:06 AM

To: Paul Emmerich

Cc: Ceph Users; Peter Wienemann

Subject: Re:  Ceph-fuse getting stuck with "currently failed to authpin local pins"

Dear Paul,

Am 28.05.2018 um 20:16 schrieb Paul Emmerich:

> I encountered the exact same issue earlier today immediately after upgrading a customer's cluster from 12.2.2 to 12.2.5.

> I've evicted the session and restarted the ganesha client to fix it, as I also couldn't find any obvious problem.

interesting! In our case, the client with the problem (it happened again a few hours later...) always was a ceph-fuse client. Evicting / rebooting the client node helped.

However, it may well be that the original issue way caused by a Ganesha client, which we also use (and the user in question who complained was accessing files in parallel via NFS and ceph-fuse),

but I don't have a clear indication of that. 

Cheers,

        Oliver

> 

> Paul

> 

> 2018-05-28 16:38 GMT+02:00 Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx>>:

> 

>     Dear Cephalopodians,

> 

>     we just had a "lockup" of many MDS requests, and also trimming fell behind, for over 2 days.

>     One of the clients (all ceph-fuse 12.2.5 on CentOS 7.5) was in status "currently failed to authpin local pins". Metadata pool usage did grow by 10 GB in those 2 days.

> 

>     Rebooting the node to force a client eviction solved the issue, and now metadata usage is down again, and all stuck requests were processed quickly.

> 

>     Is there any idea on what could cause something like that? On the client, der was no CPU load, but many processes waiting for cephfs to respond.

>     Syslog did yield anything. It only affected one user and his user directory.

> 

>     If there are no ideas: How can I collect good debug information in case this happens again?

> 

>     Cheers,

>             Oliver

> 

> 

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

> 

> 

> 

> 

> -- 

> Paul Emmerich

> 

> Looking for help with your Ceph cluster? Contact us at 
https://croit.io

> 

> croit GmbH

> Freseniusstr. 31h

> 81247 München

> www.croit.io <http://www.croit.io>

> Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com