Re: avoid 3-mds fs laggy on 1 rejoin?

Dzianis Kahanovich <mahatma@xxxxxxxxxxxxxx> · Tue, 06 Oct 2015 16:21:42 +0300

John Spray пишет:
On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
<mahatma@xxxxxxxxxxxxxx> wrote:
Even now I remove "mds standby replay = true":
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get
behaviour like on MONs - leader->down/peon->leader?

It's not clear to me why you're saying it's stuck.  Is it stuck, or is it slow?

It totally sleep (stuck) up to HEALTH_OK (to rejoin complete). Not slow. "mds 
cluster degraded".

From that log it looks like you're restarting mds.b many times in one
day, that's kind of unusual.  Are you really doing all those restarts
by hand, or is something else going wrong?

The expected behaviour is that when mds.b restarts, the mon notices
that the old mds.b instance is dead, and hands the role to the
standby-replay mds.

Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs +
starts). Surprise for me - only MDS need special actions.

What special actions are you having to perform?  It looks like your
cluster is coming back online eventually?

I don't test while, something like:
ceph mds stop <who>
ceph mds deactivate <who>
ceph mds tell <who> <args> [<args>...]
- before KILL

- something to tell mds to release "active" status and move it to another.
Also I look to "mds shutdown check = <int>" (?).
Or fix mds to do it on KILL if nothing this.

John

You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.

OK, debug later - this is production ;). But even 5 min. rejoin must be not
too problem if other 2 nodes up. Even degraded OSDs still accessible (if
balanced size/min_size & CRUSH).

About time of rejoin - simple:
# grep rejoin ceph-mds.b.log
2015-10-06 03:16:28.102415 7f91d0852700  1 mds.0.634 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:16:28.102417 7f91d0852700  1 mds.0.634 rejoin_start
2015-10-06 03:16:33.028668 7f91d0852700  1 mds.0.634 rejoin_joint_start
2015-10-06 03:18:05.266657 7f91cc749700  1 mds.0.634 rejoin_done
2015-10-06 03:18:06.070233 7f91d0852700  1 mds.0.634 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:31:22.860780 7f8ab6643700  1 mds.0.636 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:31:22.860783 7f8ab6643700  1 mds.0.636 rejoin_start
2015-10-06 03:31:32.771089 7f8ab6643700  1 mds.0.636 rejoin_joint_start
2015-10-06 03:32:48.644880 7f8ab2439700  1 mds.0.636 rejoin_done
2015-10-06 03:32:49.412352 7f8ab6643700  1 mds.0.636 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:57:03.625397 7f981c944700  1 mds.0.639 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:57:03.625400 7f981c944700  1 mds.0.639 rejoin_start
2015-10-06 03:57:14.561840 7f981c944700  1 mds.0.639 rejoin_joint_start
2015-10-06 03:58:26.875557 7f981883b700  1 mds.0.639 rejoin_done
2015-10-06 03:58:28.159967 7f981c944700  1 mds.0.639 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:12:49.984929 7f1afa6d7700  1 mds.0.642 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:12:49.984932 7f1afa6d7700  1 mds.0.642 rejoin_start
2015-10-06 04:13:01.391428 7f1afa6d7700  1 mds.0.642 rejoin_joint_start
2015-10-06 04:14:38.680632 7f1af65ce700  1 mds.0.642 rejoin_done
2015-10-06 04:14:39.802623 7f1afa6d7700  1 mds.0.642 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:23:55.942713 7f028b2a9700  1 mds.0.645 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:23:55.942716 7f028b2a9700  1 mds.0.645 rejoin_start
2015-10-06 04:24:06.260830 7f028b2a9700  1 mds.0.645 rejoin_joint_start
2015-10-06 04:24:19.627641 7f028699f700  1 mds.0.645 suicide.  wanted
down:dne, now up:rejoin
2015-10-06 04:35:53.910743 7f32cee3d700  1 mds.0.648 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:35:53.910746 7f32cee3d700  1 mds.0.648 rejoin_start
2015-10-06 04:36:03.541504 7f32cee3d700  1 mds.0.648 rejoin_joint_start
2015-10-06 04:37:14.470805 7f32cad34700  1 mds.0.648 rejoin_done
2015-10-06 04:37:15.390864 7f32cee3d700  1 mds.0.648 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:40:46.878251 7f6600df0700  1 mds.0.651 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:40:46.878254 7f6600df0700  1 mds.0.651 rejoin_start
2015-10-06 04:40:57.984821 7f6600df0700  1 mds.0.651 rejoin_joint_start
2015-10-06 04:43:23.230549 7f65fcce7700  1 mds.0.651 rejoin_done
2015-10-06 04:43:23.841793 7f6600df0700  1 mds.0.651 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:50:56.961706 7fb5871a5700  1 mds.0.655 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:50:56.961709 7fb5871a5700  1 mds.0.655 rejoin_start
2015-10-06 04:51:06.743421 7fb5871a5700  1 mds.0.655 rejoin_joint_start
2015-10-06 04:51:09.134144 7fb58289b700  1 mds.0.655 suicide.  wanted
down:dne, now up:rejoin
2015-10-06 04:56:27.819070 7f64123e5700  1 mds.0.657 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:56:27.819072 7f64123e5700  1 mds.0.657 rejoin_start
2015-10-06 04:56:27.839223 7f64123e5700  1 mds.0.657 rejoin_joint_start
2015-10-06 04:56:30.375895 7f640e2dc700  1 mds.0.657 rejoin_done
2015-10-06 04:56:31.858593 7f64123e5700  1 mds.0.657 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 05:06:11.023545 7feef429a700  1 mds.0.660 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 05:06:11.023548 7feef429a700  1 mds.0.660 rejoin_start
2015-10-06 05:06:11.433153 7feef429a700  1 mds.0.660 rejoin_joint_start
2015-10-06 05:06:46.113313 7feef1a95700  1 mds.0.660 rejoin_done
2015-10-06 05:06:47.515843 7feef429a700  1 mds.0.660 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 09:42:59.932714 7fccadb81700  1 mds.0.664 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 09:42:59.932717 7fccadb81700  1 mds.0.664 rejoin_start
2015-10-06 09:43:00.497196 7fccadb81700  1 mds.0.664 rejoin_joint_start
2015-10-06 09:43:57.889918 7fcca9a78700  1 mds.0.664 rejoin_done
2015-10-06 09:43:58.490246 7fccadb81700  1 mds.0.664 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 10:42:24.162929 7f7ddf175700  1 mds.0.666 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 10:42:24.162931 7f7ddf175700  1 mds.0.666 rejoin_start
2015-10-06 10:42:38.235885 7f7ddf175700  1 mds.0.666 rejoin_joint_start
2015-10-06 10:47:30.636129 7f7ddb06c700  1 mds.0.666 rejoin_done
2015-10-06 10:47:32.037131 7f7ddf175700  1 mds.0.666 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 12:10:32.496677 7f94435f4700  1 mds.0.670 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:10:32.496681 7f94435f4700  1 mds.0.670 rejoin_start
2015-10-06 12:10:45.968556 7f94435f4700  1 mds.0.670 rejoin_joint_start
2015-10-06 12:14:10.590516 7f943ecea700  1 mds.0.670 suicide.  wanted
down:dne, now up:rejoin
2015-10-06 12:57:19.796554 7fbe9955b700  1 mds.0.676 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:57:19.796557 7fbe9955b700  1 mds.0.676 rejoin_start
2015-10-06 12:57:31.080582 7fbe9955b700  1 mds.0.676 rejoin_joint_start
2015-10-06 12:59:39.291300 7fbe95452700  1 mds.0.676 rejoin_done
2015-10-06 12:59:40.162822 7fbe9955b700  1 mds.0.676 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 14:41:48.552281 7f8bc218d700  1 mds.0.681 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 14:41:48.552284 7f8bc218d700  1 mds.0.681 rejoin_start
2015-10-06 14:41:49.242241 7f8bc218d700  1 mds.0.681 rejoin_joint_start
2015-10-06 14:42:32.421263 7f8bbe084700  1 mds.0.681 rejoin_done
2015-10-06 14:42:33.341350 7f8bc218d700  1 mds.0.681 handle_mds_map state
change up:rejoin --> up:active

John

PS I know - PGs too much, "mon pg warn max per osd = 1400"...

John

My current mds config:

[mds]
           mds recall state timeout = 120
           mds bal mode = 1
           mds standby replay = true
           mds cache size = 500000
           mds mem max = 2097152
           mds op history size = 50
           # vs. laggy beacon
           mds decay halflife = 9
           mds beacon interval = 8
           mds beacon grace = 30

[mds.a]
           host = megaserver1
[mds.b]
           host = megaserver3
[mds.c]
           host = megaserver4

(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com