Re: Ceph-fuse getting stuck with "currently failed to authpin local pins"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I get the exact opposite to the same error message "currently failed to authpin local pins". Had a few clients on ceph-fuse 12.2.2 and they ran into those issues a lot (evicting works). Upgrading to ceph-fuse 12.2.5 fixed it. The main cluster is on 12.2.4.


The cause is user's HPC jobs or even just their login on multiple nodes accessing the same files, in a particular way. Doesn't happen to other users. Haven't quite dug into it deep enough as upgrading to 12.2.5 fixed our problem. 


From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>
Sent: Tuesday, 29 May 2018 7:29:06 AM
To: Paul Emmerich
Cc: Ceph Users; Peter Wienemann
Subject: Re: Ceph-fuse getting stuck with "currently failed to authpin local pins"
 
Dear Paul,

Am 28.05.2018 um 20:16 schrieb Paul Emmerich:
> I encountered the exact same issue earlier today immediately after upgrading a customer's cluster from 12.2.2 to 12.2.5.
> I've evicted the session and restarted the ganesha client to fix it, as I also couldn't find any obvious problem.

interesting! In our case, the client with the problem (it happened again a few hours later...) always was a ceph-fuse client. Evicting / rebooting the client node helped.
However, it may well be that the original issue way caused by a Ganesha client, which we also use (and the user in question who complained was accessing files in parallel via NFS and ceph-fuse),
but I don't have a clear indication of that.

Cheers,
        Oliver

>
> Paul
>
> 2018-05-28 16:38 GMT+02:00 Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx>>:
>
>     Dear Cephalopodians,
>
>     we just had a "lockup" of many MDS requests, and also trimming fell behind, for over 2 days.
>     One of the clients (all ceph-fuse 12.2.5 on CentOS 7.5) was in status "currently failed to authpin local pins". Metadata pool usage did grow by 10 GB in those 2 days.
>
>     Rebooting the node to force a client eviction solved the issue, and now metadata usage is down again, and all stuck requests were processed quickly.
>
>     Is there any idea on what could cause something like that? On the client, der was no CPU load, but many processes waiting for cephfs to respond.
>     Syslog did yield anything. It only affected one user and his user directory.
>
>     If there are no ideas: How can I collect good debug information in case this happens again?
>
>     Cheers,
>             Oliver
>
>
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io <http://www.croit.io>
> Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux