Re: avoid 3-mds fs laggy on 1 rejoin?

John Spray <jspray@xxxxxxxxxx> · Tue, 6 Oct 2015 13:34:20 +0100

On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
<mahatma@xxxxxxxxxxxxxx> wrote:
> Even now I remove "mds standby replay = true":
> e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
> Cluster stuck on KILL active mds.b. How to correctly stop mds to get
> behaviour like on MONs - leader->down/peon->leader?

It's not clear to me why you're saying it's stuck.  Is it stuck, or is it slow?

>From that log it looks like you're restarting mds.b many times in one
day, that's kind of unusual.  Are you really doing all those restarts
by hand, or is something else going wrong?

The expected behaviour is that when mds.b restarts, the mon notices
that the old mds.b instance is dead, and hands the role to the
standby-replay mds.

> Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs +
> starts). Surprise for me - only MDS need special actions.

What special actions are you having to perform?  It looks like your
cluster is coming back online eventually?

John

>
>>
>> You could try setting a higher debug level (e.g. debug mds = 10) on
>> your MDS before it takes over, so that the log output can give us an
>> idea of what the daemon is doing while it's stuck in rejoin.
>
>
> OK, debug later - this is production ;). But even 5 min. rejoin must be not
> too problem if other 2 nodes up. Even degraded OSDs still accessible (if
> balanced size/min_size & CRUSH).
>
> About time of rejoin - simple:
> # grep rejoin ceph-mds.b.log
> 2015-10-06 03:16:28.102415 7f91d0852700  1 mds.0.634 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 03:16:28.102417 7f91d0852700  1 mds.0.634 rejoin_start
> 2015-10-06 03:16:33.028668 7f91d0852700  1 mds.0.634 rejoin_joint_start
> 2015-10-06 03:18:05.266657 7f91cc749700  1 mds.0.634 rejoin_done
> 2015-10-06 03:18:06.070233 7f91d0852700  1 mds.0.634 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 03:31:22.860780 7f8ab6643700  1 mds.0.636 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 03:31:22.860783 7f8ab6643700  1 mds.0.636 rejoin_start
> 2015-10-06 03:31:32.771089 7f8ab6643700  1 mds.0.636 rejoin_joint_start
> 2015-10-06 03:32:48.644880 7f8ab2439700  1 mds.0.636 rejoin_done
> 2015-10-06 03:32:49.412352 7f8ab6643700  1 mds.0.636 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 03:57:03.625397 7f981c944700  1 mds.0.639 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 03:57:03.625400 7f981c944700  1 mds.0.639 rejoin_start
> 2015-10-06 03:57:14.561840 7f981c944700  1 mds.0.639 rejoin_joint_start
> 2015-10-06 03:58:26.875557 7f981883b700  1 mds.0.639 rejoin_done
> 2015-10-06 03:58:28.159967 7f981c944700  1 mds.0.639 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 04:12:49.984929 7f1afa6d7700  1 mds.0.642 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 04:12:49.984932 7f1afa6d7700  1 mds.0.642 rejoin_start
> 2015-10-06 04:13:01.391428 7f1afa6d7700  1 mds.0.642 rejoin_joint_start
> 2015-10-06 04:14:38.680632 7f1af65ce700  1 mds.0.642 rejoin_done
> 2015-10-06 04:14:39.802623 7f1afa6d7700  1 mds.0.642 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 04:23:55.942713 7f028b2a9700  1 mds.0.645 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 04:23:55.942716 7f028b2a9700  1 mds.0.645 rejoin_start
> 2015-10-06 04:24:06.260830 7f028b2a9700  1 mds.0.645 rejoin_joint_start
> 2015-10-06 04:24:19.627641 7f028699f700  1 mds.0.645 suicide.  wanted
> down:dne, now up:rejoin
> 2015-10-06 04:35:53.910743 7f32cee3d700  1 mds.0.648 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 04:35:53.910746 7f32cee3d700  1 mds.0.648 rejoin_start
> 2015-10-06 04:36:03.541504 7f32cee3d700  1 mds.0.648 rejoin_joint_start
> 2015-10-06 04:37:14.470805 7f32cad34700  1 mds.0.648 rejoin_done
> 2015-10-06 04:37:15.390864 7f32cee3d700  1 mds.0.648 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 04:40:46.878251 7f6600df0700  1 mds.0.651 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 04:40:46.878254 7f6600df0700  1 mds.0.651 rejoin_start
> 2015-10-06 04:40:57.984821 7f6600df0700  1 mds.0.651 rejoin_joint_start
> 2015-10-06 04:43:23.230549 7f65fcce7700  1 mds.0.651 rejoin_done
> 2015-10-06 04:43:23.841793 7f6600df0700  1 mds.0.651 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 04:50:56.961706 7fb5871a5700  1 mds.0.655 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 04:50:56.961709 7fb5871a5700  1 mds.0.655 rejoin_start
> 2015-10-06 04:51:06.743421 7fb5871a5700  1 mds.0.655 rejoin_joint_start
> 2015-10-06 04:51:09.134144 7fb58289b700  1 mds.0.655 suicide.  wanted
> down:dne, now up:rejoin
> 2015-10-06 04:56:27.819070 7f64123e5700  1 mds.0.657 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 04:56:27.819072 7f64123e5700  1 mds.0.657 rejoin_start
> 2015-10-06 04:56:27.839223 7f64123e5700  1 mds.0.657 rejoin_joint_start
> 2015-10-06 04:56:30.375895 7f640e2dc700  1 mds.0.657 rejoin_done
> 2015-10-06 04:56:31.858593 7f64123e5700  1 mds.0.657 handle_mds_map state
> change up:rejoin --> up:clientreplay
> 2015-10-06 05:06:11.023545 7feef429a700  1 mds.0.660 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 05:06:11.023548 7feef429a700  1 mds.0.660 rejoin_start
> 2015-10-06 05:06:11.433153 7feef429a700  1 mds.0.660 rejoin_joint_start
> 2015-10-06 05:06:46.113313 7feef1a95700  1 mds.0.660 rejoin_done
> 2015-10-06 05:06:47.515843 7feef429a700  1 mds.0.660 handle_mds_map state
> change up:rejoin --> up:clientreplay
> 2015-10-06 09:42:59.932714 7fccadb81700  1 mds.0.664 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 09:42:59.932717 7fccadb81700  1 mds.0.664 rejoin_start
> 2015-10-06 09:43:00.497196 7fccadb81700  1 mds.0.664 rejoin_joint_start
> 2015-10-06 09:43:57.889918 7fcca9a78700  1 mds.0.664 rejoin_done
> 2015-10-06 09:43:58.490246 7fccadb81700  1 mds.0.664 handle_mds_map state
> change up:rejoin --> up:clientreplay
> 2015-10-06 10:42:24.162929 7f7ddf175700  1 mds.0.666 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 10:42:24.162931 7f7ddf175700  1 mds.0.666 rejoin_start
> 2015-10-06 10:42:38.235885 7f7ddf175700  1 mds.0.666 rejoin_joint_start
> 2015-10-06 10:47:30.636129 7f7ddb06c700  1 mds.0.666 rejoin_done
> 2015-10-06 10:47:32.037131 7f7ddf175700  1 mds.0.666 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 12:10:32.496677 7f94435f4700  1 mds.0.670 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 12:10:32.496681 7f94435f4700  1 mds.0.670 rejoin_start
> 2015-10-06 12:10:45.968556 7f94435f4700  1 mds.0.670 rejoin_joint_start
> 2015-10-06 12:14:10.590516 7f943ecea700  1 mds.0.670 suicide.  wanted
> down:dne, now up:rejoin
> 2015-10-06 12:57:19.796554 7fbe9955b700  1 mds.0.676 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 12:57:19.796557 7fbe9955b700  1 mds.0.676 rejoin_start
> 2015-10-06 12:57:31.080582 7fbe9955b700  1 mds.0.676 rejoin_joint_start
> 2015-10-06 12:59:39.291300 7fbe95452700  1 mds.0.676 rejoin_done
> 2015-10-06 12:59:40.162822 7fbe9955b700  1 mds.0.676 handle_mds_map state
> change up:rejoin --> up:active
> 2015-10-06 14:41:48.552281 7f8bc218d700  1 mds.0.681 handle_mds_map state
> change up:reconnect --> up:rejoin
> 2015-10-06 14:41:48.552284 7f8bc218d700  1 mds.0.681 rejoin_start
> 2015-10-06 14:41:49.242241 7f8bc218d700  1 mds.0.681 rejoin_joint_start
> 2015-10-06 14:42:32.421263 7f8bbe084700  1 mds.0.681 rejoin_done
> 2015-10-06 14:42:33.341350 7f8bc218d700  1 mds.0.681 handle_mds_map state
> change up:rejoin --> up:active
>
>
>
>>
>> John
>>
>>>
>>> PS I know - PGs too much, "mon pg warn max per osd = 1400"...
>>>
>>>
>>>>
>>>> John
>>>>
>>>>>
>>>>> My current mds config:
>>>>>
>>>>> [mds]
>>>>>           mds recall state timeout = 120
>>>>>           mds bal mode = 1
>>>>>           mds standby replay = true
>>>>>           mds cache size = 500000
>>>>>           mds mem max = 2097152
>>>>>           mds op history size = 50
>>>>>           # vs. laggy beacon
>>>>>           mds decay halflife = 9
>>>>>           mds beacon interval = 8
>>>>>           mds beacon grace = 30
>>>>>
>>>>> [mds.a]
>>>>>           host = megaserver1
>>>>> [mds.b]
>>>>>           host = megaserver3
>>>>> [mds.c]
>>>>>           host = megaserver4
>>>>>
>>>>> (I trying to unswitch all non-defaults, IMHO no results - fixme)
>>>>> Or may be I need special care on mds stop (now - SIGKILL).
>>>>>
>>>>> --
>>>>> WBR, Dzianis Kahanovich AKA Denis Kaganovich,
>>>>> http://mahatma.bspu.unibel.by/
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>> --
>>> WBR, Dzianis Kahanovich AKA Denis Kaganovich,
>>> http://mahatma.bspu.unibel.by/
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com