Re: avoid 3-mds fs laggy on 1 rejoin?

Dzianis Kahanovich <mahatma@xxxxxxxxxxxxxx> · Tue, 06 Oct 2015 15:22:06 +0300

John Spray пишет:
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?

ceph version 0.94.3-242-g79385a8
(79385a85beea9bccd82c99b6bda653f0224c4fcd)

I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
cephfs (at least I can backup it). May be I just don't see it before, may
be
there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
whole mds cluster stuck (but, good news - all clients alive after). How
to
make mds cluster reliable on at least 1 restart?

It's not exactly clear to me how you've got this set up.  What's the
output of "ceph status"?

     cluster 4fc73849-f913-4689-b6a6-efcefccae8d1
      health HEALTH_OK
      monmap e1: 3 mons at
{a=10.227.227.101:6789/0,b=10.227.227.103:6789/0,c=10.227.227.104:6789/0}
             election epoch 28556, quorum 0,1,2 a,b,c
      mdsmap e7136: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1
up:standby
      osdmap e158986: 15 osds: 15 up, 15 in
       pgmap v60013179: 6032 pgs, 8 pools, 6528 GB data, 2827 kobjects
             16257 GB used, 6005 GB / 22263 GB avail
                 6032 active+clean
   client io 3211 kB/s rd, 1969 kB/s wr, 176 op/s

OK, thanks.  So the symptom is that when you have an MDS failure, the
standby-replay guy is coming up, but he is spending too long in
'rejoin' state, right?  How long, exactly?

Even now I remove "mds standby replay = true":
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get behaviour 
like on MONs - leader->down/peon->leader?

Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs + starts). 
Surprise for me - only MDS need special actions.

You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.

OK, debug later - this is production ;). But even 5 min. rejoin must be not too 
problem if other 2 nodes up. Even degraded OSDs still accessible (if balanced 
size/min_size & CRUSH).

About time of rejoin - simple:
# grep rejoin ceph-mds.b.log
2015-10-06 03:16:28.102415 7f91d0852700  1 mds.0.634 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 03:16:28.102417 7f91d0852700  1 mds.0.634 rejoin_start
2015-10-06 03:16:33.028668 7f91d0852700  1 mds.0.634 rejoin_joint_start
2015-10-06 03:18:05.266657 7f91cc749700  1 mds.0.634 rejoin_done
2015-10-06 03:18:06.070233 7f91d0852700  1 mds.0.634 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 03:31:22.860780 7f8ab6643700  1 mds.0.636 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 03:31:22.860783 7f8ab6643700  1 mds.0.636 rejoin_start
2015-10-06 03:31:32.771089 7f8ab6643700  1 mds.0.636 rejoin_joint_start
2015-10-06 03:32:48.644880 7f8ab2439700  1 mds.0.636 rejoin_done
2015-10-06 03:32:49.412352 7f8ab6643700  1 mds.0.636 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 03:57:03.625397 7f981c944700  1 mds.0.639 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 03:57:03.625400 7f981c944700  1 mds.0.639 rejoin_start
2015-10-06 03:57:14.561840 7f981c944700  1 mds.0.639 rejoin_joint_start
2015-10-06 03:58:26.875557 7f981883b700  1 mds.0.639 rejoin_done
2015-10-06 03:58:28.159967 7f981c944700  1 mds.0.639 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 04:12:49.984929 7f1afa6d7700  1 mds.0.642 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 04:12:49.984932 7f1afa6d7700  1 mds.0.642 rejoin_start
2015-10-06 04:13:01.391428 7f1afa6d7700  1 mds.0.642 rejoin_joint_start
2015-10-06 04:14:38.680632 7f1af65ce700  1 mds.0.642 rejoin_done
2015-10-06 04:14:39.802623 7f1afa6d7700  1 mds.0.642 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 04:23:55.942713 7f028b2a9700  1 mds.0.645 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 04:23:55.942716 7f028b2a9700  1 mds.0.645 rejoin_start
2015-10-06 04:24:06.260830 7f028b2a9700  1 mds.0.645 rejoin_joint_start
2015-10-06 04:24:19.627641 7f028699f700  1 mds.0.645 suicide.  wanted down:dne, 
now up:rejoin
2015-10-06 04:35:53.910743 7f32cee3d700  1 mds.0.648 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 04:35:53.910746 7f32cee3d700  1 mds.0.648 rejoin_start
2015-10-06 04:36:03.541504 7f32cee3d700  1 mds.0.648 rejoin_joint_start
2015-10-06 04:37:14.470805 7f32cad34700  1 mds.0.648 rejoin_done
2015-10-06 04:37:15.390864 7f32cee3d700  1 mds.0.648 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 04:40:46.878251 7f6600df0700  1 mds.0.651 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 04:40:46.878254 7f6600df0700  1 mds.0.651 rejoin_start
2015-10-06 04:40:57.984821 7f6600df0700  1 mds.0.651 rejoin_joint_start
2015-10-06 04:43:23.230549 7f65fcce7700  1 mds.0.651 rejoin_done
2015-10-06 04:43:23.841793 7f6600df0700  1 mds.0.651 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 04:50:56.961706 7fb5871a5700  1 mds.0.655 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 04:50:56.961709 7fb5871a5700  1 mds.0.655 rejoin_start
2015-10-06 04:51:06.743421 7fb5871a5700  1 mds.0.655 rejoin_joint_start
2015-10-06 04:51:09.134144 7fb58289b700  1 mds.0.655 suicide.  wanted down:dne, 
now up:rejoin
2015-10-06 04:56:27.819070 7f64123e5700  1 mds.0.657 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 04:56:27.819072 7f64123e5700  1 mds.0.657 rejoin_start
2015-10-06 04:56:27.839223 7f64123e5700  1 mds.0.657 rejoin_joint_start
2015-10-06 04:56:30.375895 7f640e2dc700  1 mds.0.657 rejoin_done
2015-10-06 04:56:31.858593 7f64123e5700  1 mds.0.657 handle_mds_map state change 
up:rejoin --> up:clientreplay
2015-10-06 05:06:11.023545 7feef429a700  1 mds.0.660 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 05:06:11.023548 7feef429a700  1 mds.0.660 rejoin_start
2015-10-06 05:06:11.433153 7feef429a700  1 mds.0.660 rejoin_joint_start
2015-10-06 05:06:46.113313 7feef1a95700  1 mds.0.660 rejoin_done
2015-10-06 05:06:47.515843 7feef429a700  1 mds.0.660 handle_mds_map state change 
up:rejoin --> up:clientreplay
2015-10-06 09:42:59.932714 7fccadb81700  1 mds.0.664 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 09:42:59.932717 7fccadb81700  1 mds.0.664 rejoin_start
2015-10-06 09:43:00.497196 7fccadb81700  1 mds.0.664 rejoin_joint_start
2015-10-06 09:43:57.889918 7fcca9a78700  1 mds.0.664 rejoin_done
2015-10-06 09:43:58.490246 7fccadb81700  1 mds.0.664 handle_mds_map state change 
up:rejoin --> up:clientreplay
2015-10-06 10:42:24.162929 7f7ddf175700  1 mds.0.666 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 10:42:24.162931 7f7ddf175700  1 mds.0.666 rejoin_start
2015-10-06 10:42:38.235885 7f7ddf175700  1 mds.0.666 rejoin_joint_start
2015-10-06 10:47:30.636129 7f7ddb06c700  1 mds.0.666 rejoin_done
2015-10-06 10:47:32.037131 7f7ddf175700  1 mds.0.666 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 12:10:32.496677 7f94435f4700  1 mds.0.670 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 12:10:32.496681 7f94435f4700  1 mds.0.670 rejoin_start
2015-10-06 12:10:45.968556 7f94435f4700  1 mds.0.670 rejoin_joint_start
2015-10-06 12:14:10.590516 7f943ecea700  1 mds.0.670 suicide.  wanted down:dne, 
now up:rejoin
2015-10-06 12:57:19.796554 7fbe9955b700  1 mds.0.676 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 12:57:19.796557 7fbe9955b700  1 mds.0.676 rejoin_start
2015-10-06 12:57:31.080582 7fbe9955b700  1 mds.0.676 rejoin_joint_start
2015-10-06 12:59:39.291300 7fbe95452700  1 mds.0.676 rejoin_done
2015-10-06 12:59:40.162822 7fbe9955b700  1 mds.0.676 handle_mds_map state change 
up:rejoin --> up:active
2015-10-06 14:41:48.552281 7f8bc218d700  1 mds.0.681 handle_mds_map state change 
up:reconnect --> up:rejoin
2015-10-06 14:41:48.552284 7f8bc218d700  1 mds.0.681 rejoin_start
2015-10-06 14:41:49.242241 7f8bc218d700  1 mds.0.681 rejoin_joint_start
2015-10-06 14:42:32.421263 7f8bbe084700  1 mds.0.681 rejoin_done
2015-10-06 14:42:33.341350 7f8bc218d700  1 mds.0.681 handle_mds_map state change 
up:rejoin --> up:active

John

PS I know - PGs too much, "mon pg warn max per osd = 1400"...

John

My current mds config:

[mds]
          mds recall state timeout = 120
          mds bal mode = 1
          mds standby replay = true
          mds cache size = 500000
          mds mem max = 2097152
          mds op history size = 50
          # vs. laggy beacon
          mds decay halflife = 9
          mds beacon interval = 8
          mds beacon grace = 30

[mds.a]
          host = megaserver1
[mds.b]
          host = megaserver3
[mds.c]
          host = megaserver4

(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com