avoid 3-mds fs laggy on 1 rejoin?

Dzianis Kahanovich <mahatma@xxxxxxxxxxxxxx> · Tue, 06 Oct 2015 13:43:41 +0300

Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?

ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd)

I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to cephfs 
(at least I can backup it). May be I just don't see it before, may be there are 
cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) - whole mds cluster 
stuck (but, good news - all clients alive after). How to make mds cluster 
reliable on at least 1 restart?

My current mds config:

[mds]
        mds recall state timeout = 120
        mds bal mode = 1
        mds standby replay = true
        mds cache size = 500000
        mds mem max = 2097152
        mds op history size = 50
	# vs. laggy beacon
        mds decay halflife = 9
        mds beacon interval = 8
        mds beacon grace = 30

[mds.a]
        host = megaserver1
[mds.b]
        host = megaserver3
[mds.c]
        host = megaserver4

(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com