John Spray пишет:
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?
ceph version 0.94.3-242-g79385a8
(79385a85beea9bccd82c99b6bda653f0224c4fcd)
I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
cephfs (at least I can backup it). May be I just don't see it before, may
be
there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
whole mds cluster stuck (but, good news - all clients alive after). How
to
make mds cluster reliable on at least 1 restart?
It's not exactly clear to me how you've got this set up. What's the
output of "ceph status"?
cluster 4fc73849-f913-4689-b6a6-efcefccae8d1
health HEALTH_OK
monmap e1: 3 mons at
{a=10.227.227.101:6789/0,b=10.227.227.103:6789/0,c=10.227.227.104:6789/0}
election epoch 28556, quorum 0,1,2 a,b,c
mdsmap e7136: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1
up:standby
osdmap e158986: 15 osds: 15 up, 15 in
pgmap v60013179: 6032 pgs, 8 pools, 6528 GB data, 2827 kobjects
16257 GB used, 6005 GB / 22263 GB avail
6032 active+clean
client io 3211 kB/s rd, 1969 kB/s wr, 176 op/s
OK, thanks. So the symptom is that when you have an MDS failure, the
standby-replay guy is coming up, but he is spending too long in
'rejoin' state, right? How long, exactly?
Even now I remove "mds standby replay = true":
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get behaviour
like on MONs - leader->down/peon->leader?
Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs + starts).
Surprise for me - only MDS need special actions.
You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.
OK, debug later - this is production ;). But even 5 min. rejoin must be not too
problem if other 2 nodes up. Even degraded OSDs still accessible (if balanced
size/min_size & CRUSH).
About time of rejoin - simple:
# grep rejoin ceph-mds.b.log
2015-10-06 03:16:28.102415 7f91d0852700 1 mds.0.634 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 03:16:28.102417 7f91d0852700 1 mds.0.634 rejoin_start
2015-10-06 03:16:33.028668 7f91d0852700 1 mds.0.634 rejoin_joint_start
2015-10-06 03:18:05.266657 7f91cc749700 1 mds.0.634 rejoin_done
2015-10-06 03:18:06.070233 7f91d0852700 1 mds.0.634 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 03:31:22.860780 7f8ab6643700 1 mds.0.636 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 03:31:22.860783 7f8ab6643700 1 mds.0.636 rejoin_start
2015-10-06 03:31:32.771089 7f8ab6643700 1 mds.0.636 rejoin_joint_start
2015-10-06 03:32:48.644880 7f8ab2439700 1 mds.0.636 rejoin_done
2015-10-06 03:32:49.412352 7f8ab6643700 1 mds.0.636 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 03:57:03.625397 7f981c944700 1 mds.0.639 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 03:57:03.625400 7f981c944700 1 mds.0.639 rejoin_start
2015-10-06 03:57:14.561840 7f981c944700 1 mds.0.639 rejoin_joint_start
2015-10-06 03:58:26.875557 7f981883b700 1 mds.0.639 rejoin_done
2015-10-06 03:58:28.159967 7f981c944700 1 mds.0.639 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:12:49.984929 7f1afa6d7700 1 mds.0.642 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:12:49.984932 7f1afa6d7700 1 mds.0.642 rejoin_start
2015-10-06 04:13:01.391428 7f1afa6d7700 1 mds.0.642 rejoin_joint_start
2015-10-06 04:14:38.680632 7f1af65ce700 1 mds.0.642 rejoin_done
2015-10-06 04:14:39.802623 7f1afa6d7700 1 mds.0.642 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:23:55.942713 7f028b2a9700 1 mds.0.645 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:23:55.942716 7f028b2a9700 1 mds.0.645 rejoin_start
2015-10-06 04:24:06.260830 7f028b2a9700 1 mds.0.645 rejoin_joint_start
2015-10-06 04:24:19.627641 7f028699f700 1 mds.0.645 suicide. wanted down:dne,
now up:rejoin
2015-10-06 04:35:53.910743 7f32cee3d700 1 mds.0.648 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:35:53.910746 7f32cee3d700 1 mds.0.648 rejoin_start
2015-10-06 04:36:03.541504 7f32cee3d700 1 mds.0.648 rejoin_joint_start
2015-10-06 04:37:14.470805 7f32cad34700 1 mds.0.648 rejoin_done
2015-10-06 04:37:15.390864 7f32cee3d700 1 mds.0.648 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:40:46.878251 7f6600df0700 1 mds.0.651 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:40:46.878254 7f6600df0700 1 mds.0.651 rejoin_start
2015-10-06 04:40:57.984821 7f6600df0700 1 mds.0.651 rejoin_joint_start
2015-10-06 04:43:23.230549 7f65fcce7700 1 mds.0.651 rejoin_done
2015-10-06 04:43:23.841793 7f6600df0700 1 mds.0.651 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:50:56.961706 7fb5871a5700 1 mds.0.655 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:50:56.961709 7fb5871a5700 1 mds.0.655 rejoin_start
2015-10-06 04:51:06.743421 7fb5871a5700 1 mds.0.655 rejoin_joint_start
2015-10-06 04:51:09.134144 7fb58289b700 1 mds.0.655 suicide. wanted down:dne,
now up:rejoin
2015-10-06 04:56:27.819070 7f64123e5700 1 mds.0.657 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:56:27.819072 7f64123e5700 1 mds.0.657 rejoin_start
2015-10-06 04:56:27.839223 7f64123e5700 1 mds.0.657 rejoin_joint_start
2015-10-06 04:56:30.375895 7f640e2dc700 1 mds.0.657 rejoin_done
2015-10-06 04:56:31.858593 7f64123e5700 1 mds.0.657 handle_mds_map state change
up:rejoin --> up:clientreplay
2015-10-06 05:06:11.023545 7feef429a700 1 mds.0.660 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 05:06:11.023548 7feef429a700 1 mds.0.660 rejoin_start
2015-10-06 05:06:11.433153 7feef429a700 1 mds.0.660 rejoin_joint_start
2015-10-06 05:06:46.113313 7feef1a95700 1 mds.0.660 rejoin_done
2015-10-06 05:06:47.515843 7feef429a700 1 mds.0.660 handle_mds_map state change
up:rejoin --> up:clientreplay
2015-10-06 09:42:59.932714 7fccadb81700 1 mds.0.664 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 09:42:59.932717 7fccadb81700 1 mds.0.664 rejoin_start
2015-10-06 09:43:00.497196 7fccadb81700 1 mds.0.664 rejoin_joint_start
2015-10-06 09:43:57.889918 7fcca9a78700 1 mds.0.664 rejoin_done
2015-10-06 09:43:58.490246 7fccadb81700 1 mds.0.664 handle_mds_map state change
up:rejoin --> up:clientreplay
2015-10-06 10:42:24.162929 7f7ddf175700 1 mds.0.666 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 10:42:24.162931 7f7ddf175700 1 mds.0.666 rejoin_start
2015-10-06 10:42:38.235885 7f7ddf175700 1 mds.0.666 rejoin_joint_start
2015-10-06 10:47:30.636129 7f7ddb06c700 1 mds.0.666 rejoin_done
2015-10-06 10:47:32.037131 7f7ddf175700 1 mds.0.666 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 12:10:32.496677 7f94435f4700 1 mds.0.670 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 12:10:32.496681 7f94435f4700 1 mds.0.670 rejoin_start
2015-10-06 12:10:45.968556 7f94435f4700 1 mds.0.670 rejoin_joint_start
2015-10-06 12:14:10.590516 7f943ecea700 1 mds.0.670 suicide. wanted down:dne,
now up:rejoin
2015-10-06 12:57:19.796554 7fbe9955b700 1 mds.0.676 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 12:57:19.796557 7fbe9955b700 1 mds.0.676 rejoin_start
2015-10-06 12:57:31.080582 7fbe9955b700 1 mds.0.676 rejoin_joint_start
2015-10-06 12:59:39.291300 7fbe95452700 1 mds.0.676 rejoin_done
2015-10-06 12:59:40.162822 7fbe9955b700 1 mds.0.676 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 14:41:48.552281 7f8bc218d700 1 mds.0.681 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 14:41:48.552284 7f8bc218d700 1 mds.0.681 rejoin_start
2015-10-06 14:41:49.242241 7f8bc218d700 1 mds.0.681 rejoin_joint_start
2015-10-06 14:42:32.421263 7f8bbe084700 1 mds.0.681 rejoin_done
2015-10-06 14:42:33.341350 7f8bc218d700 1 mds.0.681 handle_mds_map state change
up:rejoin --> up:active
John
PS I know - PGs too much, "mon pg warn max per osd = 1400"...
John
My current mds config:
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com