Hi Janek, I'd love to hear your standard maintenance procedures. Are you cleaning up those open files outside of "rejoin" OOMs ? I guess we're pretty lucky with our CephFS's because we have more than 1k clients and it is pretty solid (though the last upgrade had a hiccup decreasing down to single active MDS). -- Dan On Fri, Dec 4, 2020 at 8:20 PM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote: > > This is very common issue. Deleting mdsX_openfiles.Y has become part of > my standard maintenance repertoire. As soon as you have a few more > clients and one of them starts opening and closing files in rapid > succession (or does other metadata-heavy things), it becomes very likely > that the MDS crashes and is unable to recover. > > There have been numerous fixes in the past, which improved the overall > stability, but it is far from perfect. I am happy to see another patch > in that direction, but I believe more effort needs to be spent here. It > is way too easy to DoS the MDS from a single client. Our 78-node CephFS > beats our old NFS RAID server in terms of throughput, but latency and > stability are way behind. > > Janek > > On 04/12/2020 11:39, Dan van der Ster wrote: > > Excellent! > > > > For the record, this PR is the plan to fix this: > > https://github.com/ceph/ceph/pull/36089 > > (nautilus, octopus PRs here: https://github.com/ceph/ceph/pull/37382 > > https://github.com/ceph/ceph/pull/37383) > > > > Cheers, Dan > > > > On Fri, Dec 4, 2020 at 11:35 AM Anton Aleksandrov <anton@xxxxxxxxxxxxxx> wrote: > >> Thank you very much! This solution helped: > >> > >> Stop all MDS, then: > >> # rados -p cephfs_metadata_pool rm mds0_openfiles.0 > >> then start one MDS. > >> > >> We are back online. Amazing!!! :) > >> > >> > >> On 04.12.2020 12:20, Dan van der Ster wrote: > >>> Please also make sure the mds_beacon_grace is high on the mon's too. > >>> > >>> it doesn't matter which mds you select to be the running one. > >>> > >>> Is the processing getting killed, restarted? > >>> If you're confident that the mds is getting OOM killed during rejoin > >>> step, then you might find this useful: > >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/028964.html > >>> > >>> Stop all MDS, then: > >>> # rados -p cephfs_metadata_pool rm mds0_openfiles.0 > >>> then start one MDS. > >>> > >>> -- Dan > >>> > >>> On Fri, Dec 4, 2020 at 11:05 AM Anton Aleksandrov <anton@xxxxxxxxxxxxxx> wrote: > >>>> Yes, MDS eats all memory+swap, stays like this for a moment and then > >>>> frees memory. > >>>> > >>>> mds_beacon_grace was already set to 1800 > >>>> > >>>> Also on other it is seen this message: Map has assigned me to become a > >>>> standby. > >>>> > >>>> Does it matter, which MDS we stop and which we leave running? > >>>> > >>>> Anton > >>>> > >>>> > >>>> On 04.12.2020 11:53, Dan van der Ster wrote: > >>>>> How many active MDS's did you have? (max_mds == 1, right?) > >>>>> > >>>>> Stop the other two MDS's so you can focus on getting exactly one running. > >>>>> Tail the log file and see what it is reporting. > >>>>> Increase mds_beacon_grace to 600 so that the mon doesn't fail this MDS > >>>>> while it is rejoining. > >>>>> > >>>>> Is that single MDS running out of memory during the rejoin phase? > >>>>> > >>>>> -- dan > >>>>> > >>>>> On Fri, Dec 4, 2020 at 10:49 AM Anton Aleksandrov <anton@xxxxxxxxxxxxxx> wrote: > >>>>>> Hello community, > >>>>>> > >>>>>> we are on ceph 13.2.8 - today something happenned with one MDS and cephs > >>>>>> status tells, that filesystem is degraded. It won't mount either. I have > >>>>>> take server with MDS, that was not working down. There are 2 more MDS > >>>>>> servers, but they stay in "rejoin" state. Also only 1 is shown in > >>>>>> "services", even though there are 2. > >>>>>> > >>>>>> Both running MDS servers have these lines in their logs: > >>>>>> > >>>>>> heartbeat_map is_healthy 'MDSRank' had timed out after 15 > >>>>>> mds.beacon.mds2 Skipping beacon heartbeat to monitors (last acked > >>>>>> 28.8979s ago); MDS internal heartbeat is not healthy! > >>>>>> > >>>>>> On one of MDS nodes I enabled more detailed debug, so I am getting there > >>>>>> also: > >>>>>> > >>>>>> mds.beacon.mds3 Sending beacon up:standby seq 178 > >>>>>> mds.beacon.mds3 received beacon reply up:standby seq 178 rtt 0.000999968 > >>>>>> > >>>>>> Makes no sense and too much stress in my head... Anyone could help please? > >>>>>> > >>>>>> Anton. > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx