Re: mds slow request with “failed to authpin, subtree is being exported"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/27/23 13:12, zxcs wrote:
current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed time(1 hour).

we also have a question about how to set no balance for multi active mds.

means, we will enable multi active mds(to improve throughput) and no balance for these mds.

and if we set mds_bal_interval as big number seems can void this issue?

You can just set 'mds_bal_interval' to 0.




Thanks,
xz

2023年11月27日 10:56,Ben <ruidong.gao@xxxxxxxxx> 写道:

with the same mds configuration, we see exactly the same(problem, log and
solution) with 17.2.5, constantly happening again and again in couples days
intervals. MDS servers are stuck somewhere, ceph status reports no issue
however. We need to restart some of the mds (if not all of them) to restore
them back. Hopefully this could be fixed soon or get docs updated with
warning for the balancer's usage in production environment.

thanks and regards

Xiubo Li <xiubli@xxxxxxxxxx> 于2023年11月23日周四 15:47写道:


On 11/23/23 11:25, zxcs wrote:
Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest
complain “failed to authpin, subtree is being exported"

And how to avoid this, would you please help to shed some light here?

Okay, as Frank mentioned you can try to disable the balancer by pining
the directories. As I remembered the balancer is buggy.

And also you can raise one ceph tracker and provide the debug logs if
you have.

Thanks

- Xiubo


Thanks,
xz


2023年11月22日 19:44,Xiubo Li <xiubli@xxxxxxxxxx> 写道:


On 11/22/23 16:02, zxcs wrote:
HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently,
we have two nodes mount with ceph-fuse due to the old os system.

and  one nodes run a python script with `glob.glob(path)`, and another
client doing `cp` operation on the same path.

then we see some log about `mds slow request`, and logs complain
“failed to authpin, subtree is being exported"

then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and
how to fix it without restart mds(it will influence other users) ?
BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are going
on.

Thanks

- Xiubo

Thanks a ton!


xz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux