Re: avoid 3-mds fs laggy on 1 rejoin?

John Spray <jspray@xxxxxxxxxx> · Tue, 6 Oct 2015 12:02:01 +0100

On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich
<mahatma@xxxxxxxxxxxxxx> wrote:
> Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?
>
> ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd)
>
> I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
> cephfs (at least I can backup it). May be I just don't see it before, may be
> there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
> whole mds cluster stuck (but, good news - all clients alive after). How to
> make mds cluster reliable on at least 1 restart?

It's not exactly clear to me how you've got this set up.  What's the
output of "ceph status"?

John

>
> My current mds config:
>
> [mds]
>         mds recall state timeout = 120
>         mds bal mode = 1
>         mds standby replay = true
>         mds cache size = 500000
>         mds mem max = 2097152
>         mds op history size = 50
>         # vs. laggy beacon
>         mds decay halflife = 9
>         mds beacon interval = 8
>         mds beacon grace = 30
>
> [mds.a]
>         host = megaserver1
> [mds.b]
>         host = megaserver3
> [mds.c]
>         host = megaserver4
>
> (I trying to unswitch all non-defaults, IMHO no results - fixme)
> Or may be I need special care on mds stop (now - SIGKILL).
>
> --
> WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com