Re: [Ceph-users] Re: MDS failing under load with large cache sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Update: turns out I just had to wait for an hour. The MDSs were sending
Beacons regularly, so the MONs didn't try to kill them and instead let
them finish doing whatever they were doing.

Unlike the other bug where the number of open files outgrows what the
MDS can handle, this incident allowed "self-healing", but I still
consider this a severe bug.


On 06/01/2020 12:05, Janek Bevendorff wrote:
> Hi, my MDS failed again, but this time I cannot recover it by deleting
> the mds*_openfiles .0 object. The startup behaviour is also different.
> Both inode count and cache size stay at zero while the MDS is replaying.
>
> When I set the MDS log level to 7, I get tons of these messages:
>
> 2020-01-06 11:59:49.303 7f30149e4700  7 mds.1.cache  current root is
> [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1
> state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855
> rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079)
> hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900]
> 2020-01-06 11:59:49.323 7f30149e4700  7 mds.1.cache adjust_subtree_auth
> -1,-2 -> -2,-2 on [dir 0x1000ae4a784 /XXX/XXX/ [2,head] auth v=114
> cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:32.658490 9=9+0) n(v1
> rc2019-09-16 15:51:58.418555 b21646377 9=9+0) hs=0+0,ss=0+0 0x5608c602cd00]
> 2020-01-06 11:59:49.323 7f30149e4700  7 mds.1.cache  current root is
> [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1
> state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855
> rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079)
> hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900]
> 2020-01-06 11:59:49.343 7f30149e4700  7 mds.1.cache adjust_subtree_auth
> -1,-2 -> -2,-2 on [dir 0x1000ae4a78b /XXX/XXX/ [2,head] auth v=102
> cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:35.046498 9=9+0) n(v1
> rc2019-09-16 15:51:58.478556 b1430317 9=9+0) hs=0+0,ss=0+0 0x5608c602d200]
> 2020-01-06 11:59:49.343 7f30149e4700  7 mds.1.cache  current root is
> [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1
> state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855
> rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079)
> hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900]
> 2020-01-06 11:59:49.363 7f30149e4700  7 mds.1.cache adjust_subtree_auth
> -1,-2 -> -2,-2 on [dir 0x1000ae4a78e /XXX/XXX/ [2,head] auth v=91 cv=0/0
> state=1073741824 f(v0 m2019-08-23 05:07:38.986513 8=8+0) n(v1
> rc2019-09-16 15:51:58.498556 b1932614 8=8+0) hs=0+0,ss=0+0 0x5608c602d700]
> 2020-01-06 11:59:49.363 7f30149e4700  7 mds.1.cache  current root is
> [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1
> state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855
> rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079)
> hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900]
>
> Is there any way I can recover the MDS? I tried wiping sessions on
> startup etc., but nothing worked.
>
> Thanks
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux