Re: mds slow request with “failed to authpin, subtree is being exported"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There are some unhandled race conditions in the MDS cluster in rare circumstances.

We had this issue with mimic and octopus and it went away after manually pinning sub-dirs to MDS ranks; see https://docs.ceph.com/en/nautilus/cephfs/multimds/?highlight=dir%20pin#manually-pinning-directory-trees-to-a-particular-rank.

This has the added advantage that one can bypass the internal load-balancer, which was horrible for our work loads. I have a related post about ephemeral pinning on this list one-two years ago. You should be able to find it. Short story: after manually pinning all user directories to ranks, all our problems disappeared and performance improved a lot. MDS load dropped from 130% average to 10-20%. So did memory consumption and cache recycling.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Wednesday, November 22, 2023 12:30 PM
To: ceph-users@xxxxxxx
Subject:   Re: mds slow request with “failed to authpin, subtree is being exported"

Hi,

we've seen this a year ago in a Nautilus cluster with multi-active MDS
as well. It turned up only once within several years and we decided
not to look too closely at that time. How often do you see it? Is it
reproducable? In that case I'd recommend to create a tracker issue.

Regards,
Eugen

Zitat von zxcs <zhuxiongcs@xxxxxxx>:

> HI, Experts,
>
> we are using cephfs with  16.2.* with multi active mds, and
> recently, we have two nodes mount with ceph-fuse due to the old os
> system.
>
> and  one nodes run a python script with `glob.glob(path)`, and
> another client doing `cp` operation on the same path.
>
> then we see some log about `mds slow request`, and logs complain
> “failed to authpin, subtree is being exported"
>
> then need to restart mds,
>
>
> our question is, does there any dead lock?  how can we avoid this
> and how to fix it without restart mds(it will influence other users) ?
>
>
> Thanks a ton!
>
>
> xz
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux