Re: Ceph-fuse getting stuck with "currently failed to authpin local pins"

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 30 May 2018 16:37:10 +0800

On Wed, May 30, 2018 at 3:04 PM, Oliver Freyermuth
<freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> ij our case, there's only a single active MDS
> (+1 standby-replay + 1 standby).
> We also get the health warning in case it happens.
>

Were there "client.xxx isn't responding to mclientcaps(revoke)"
warnings in cluster log.  please send them to me if there were.

> Cheers,
> Oliver
>
> Am 30.05.2018 um 03:25 schrieb Yan, Zheng:
>> I could be http://tracker.ceph.com/issues/24172
>>
>>
>> On Wed, May 30, 2018 at 9:01 AM, Linh Vu <vul@xxxxxxxxxxxxxx> wrote:
>>> In my case, I have multiple active MDS (with directory pinning at the very
>>> top level), and there would be "Client xxx failing to respond to capability
>>> release" health warning every single time that happens.
>>>
>>> ________________________________
>>> From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Yan, Zheng
>>> <ukernel@xxxxxxxxx>
>>> Sent: Tuesday, 29 May 2018 9:53:43 PM
>>> To: Oliver Freyermuth
>>> Cc: Ceph Users; Peter Wienemann
>>> Subject: Re:  Ceph-fuse getting stuck with "currently failed to
>>> authpin local pins"
>>>
>>> Single or multiple acitve mds? Were there "Client xxx failing to
>>> respond to capability release" health warning?
>>>
>>> On Mon, May 28, 2018 at 10:38 PM, Oliver Freyermuth
>>> <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>>>> Dear Cephalopodians,
>>>>
>>>> we just had a "lockup" of many MDS requests, and also trimming fell
>>>> behind, for over 2 days.
>>>> One of the clients (all ceph-fuse 12.2.5 on CentOS 7.5) was in status
>>>> "currently failed to authpin local pins". Metadata pool usage did grow by 10
>>>> GB in those 2 days.
>>>>
>>>> Rebooting the node to force a client eviction solved the issue, and now
>>>> metadata usage is down again, and all stuck requests were processed quickly.
>>>>
>>>> Is there any idea on what could cause something like that? On the client,
>>>> der was no CPU load, but many processes waiting for cephfs to respond.
>>>> Syslog did yield anything. It only affected one user and his user
>>>> directory.
>>>>
>>>> If there are no ideas: How can I collect good debug information in case
>>>> this happens again?
>>>>
>>>> Cheers,
>>>>         Oliver
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>>
>>>> https://protect-au.mimecast.com/s/Zl9aCXLKNwFxY9nNc6jQJC?domain=lists.ceph.com
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com