On Tue, Dec 5, 2023 at 6:34 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 12/4/23 16:25, zxcs wrote: > > Thanks a lot, Xiubo! > > > > we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease. > > > > But somehow we still see mds complain slow request. and from mds log , can see > > > > “slow request *** seconds old, received at 2023-12-04T…: internal op exportdir:mds.* currently acquired locks” > > > > so our question is, why it still see "internal op exportdir”, any other config also need to set 0? and could please shed light here which config we need set . > > > IMO, this should be enough. > > Venky, > > Did I miss something here ? You missed nothing. Setting `mds_bal_interval = 0` disables the balancer. I guess there are in-progress exports that would take some time to backoff and the slow ops should eventually get cleaned up. I'd say wait a bit and see if the slow request resolves by itself. FWIW, there was a feature request a while back to cancel an ongoing export. We should prioritize having that. > > Thanks > > - Xiubo > > > > Thanks, > > xz > > > >> 2023年11月27日 13:19,Xiubo Li <xiubli@xxxxxxxxxx> 写道: > >> > >> > >> On 11/27/23 13:12, zxcs wrote: > >>> current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed time(1 hour). > >>> > >>> we also have a question about how to set no balance for multi active mds. > >>> > >>> means, we will enable multi active mds(to improve throughput) and no balance for these mds. > >>> > >>> and if we set mds_bal_interval as big number seems can void this issue? > >>> > >> You can just set 'mds_bal_interval' to 0. > >> > >> > >>> > >>> Thanks, > >>> xz > >>> > >>>> 2023年11月27日 10:56,Ben <ruidong.gao@xxxxxxxxx> 写道: > >>>> > >>>> with the same mds configuration, we see exactly the same(problem, log and > >>>> solution) with 17.2.5, constantly happening again and again in couples days > >>>> intervals. MDS servers are stuck somewhere, ceph status reports no issue > >>>> however. We need to restart some of the mds (if not all of them) to restore > >>>> them back. Hopefully this could be fixed soon or get docs updated with > >>>> warning for the balancer's usage in production environment. > >>>> > >>>> thanks and regards > >>>> > >>>> Xiubo Li <xiubli@xxxxxxxxxx> 于2023年11月23日周四 15:47写道: > >>>> > >>>>> On 11/23/23 11:25, zxcs wrote: > >>>>>> Thanks a ton, Xiubo! > >>>>>> > >>>>>> it not disappear. > >>>>>> > >>>>>> even we umount the ceph directory on these two old os node. > >>>>>> > >>>>>> after dump ops flight , we can see some request, and the earliest > >>>>> complain “failed to authpin, subtree is being exported" > >>>>>> And how to avoid this, would you please help to shed some light here? > >>>>> Okay, as Frank mentioned you can try to disable the balancer by pining > >>>>> the directories. As I remembered the balancer is buggy. > >>>>> > >>>>> And also you can raise one ceph tracker and provide the debug logs if > >>>>> you have. > >>>>> > >>>>> Thanks > >>>>> > >>>>> - Xiubo > >>>>> > >>>>> > >>>>>> Thanks, > >>>>>> xz > >>>>>> > >>>>>> > >>>>>>> 2023年11月22日 19:44,Xiubo Li <xiubli@xxxxxxxxxx> 写道: > >>>>>>> > >>>>>>> > >>>>>>> On 11/22/23 16:02, zxcs wrote: > >>>>>>>> HI, Experts, > >>>>>>>> > >>>>>>>> we are using cephfs with 16.2.* with multi active mds, and recently, > >>>>> we have two nodes mount with ceph-fuse due to the old os system. > >>>>>>>> and one nodes run a python script with `glob.glob(path)`, and another > >>>>> client doing `cp` operation on the same path. > >>>>>>>> then we see some log about `mds slow request`, and logs complain > >>>>> “failed to authpin, subtree is being exported" > >>>>>>>> then need to restart mds, > >>>>>>>> > >>>>>>>> > >>>>>>>> our question is, does there any dead lock? how can we avoid this and > >>>>> how to fix it without restart mds(it will influence other users) ? > >>>>>>> BTW, won't the slow requests disappear themself later ? > >>>>>>> > >>>>>>> It looks like the exporting is slow or there too many exports are going > >>>>> on. > >>>>>>> Thanks > >>>>>>> > >>>>>>> - Xiubo > >>>>>>> > >>>>>>>> Thanks a ton! > >>>>>>>> > >>>>>>>> > >>>>>>>> xz > >>>>>>>> _______________________________________________ > >>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>>>>> _______________________________________________ > >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>>> _______________________________________________ > >>>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx