Update: turns out I just had to wait for an hour. The MDSs were sending Beacons regularly, so the MONs didn't try to kill them and instead let them finish doing whatever they were doing. Unlike the other bug where the number of open files outgrows what the MDS can handle, this incident allowed "self-healing", but I still consider this a severe bug. On 06/01/2020 12:05, Janek Bevendorff wrote: > Hi, my MDS failed again, but this time I cannot recover it by deleting > the mds*_openfiles .0 object. The startup behaviour is also different. > Both inode count and cache size stay at zero while the MDS is replaying. > > When I set the MDS log level to 7, I get tons of these messages: > > 2020-01-06 11:59:49.303 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > 2020-01-06 11:59:49.323 7f30149e4700 7 mds.1.cache adjust_subtree_auth > -1,-2 -> -2,-2 on [dir 0x1000ae4a784 /XXX/XXX/ [2,head] auth v=114 > cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:32.658490 9=9+0) n(v1 > rc2019-09-16 15:51:58.418555 b21646377 9=9+0) hs=0+0,ss=0+0 0x5608c602cd00] > 2020-01-06 11:59:49.323 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > 2020-01-06 11:59:49.343 7f30149e4700 7 mds.1.cache adjust_subtree_auth > -1,-2 -> -2,-2 on [dir 0x1000ae4a78b /XXX/XXX/ [2,head] auth v=102 > cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:35.046498 9=9+0) n(v1 > rc2019-09-16 15:51:58.478556 b1430317 9=9+0) hs=0+0,ss=0+0 0x5608c602d200] > 2020-01-06 11:59:49.343 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > 2020-01-06 11:59:49.363 7f30149e4700 7 mds.1.cache adjust_subtree_auth > -1,-2 -> -2,-2 on [dir 0x1000ae4a78e /XXX/XXX/ [2,head] auth v=91 cv=0/0 > state=1073741824 f(v0 m2019-08-23 05:07:38.986513 8=8+0) n(v1 > rc2019-09-16 15:51:58.498556 b1932614 8=8+0) hs=0+0,ss=0+0 0x5608c602d700] > 2020-01-06 11:59:49.363 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > > Is there any way I can recover the MDS? I tried wiping sessions on > startup etc., but nothing worked. > > Thanks > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx