Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 28 Feb 2018 10:37:40 +0100

Hi all,

I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
OSD's updated fine.

When updating the MDS's (we have 2 active and 1 standby), I started
with the standby.

At the moment the standby MDS restarted into 12.2.4 [1], both active
MDSs (still running 12.2.2) suicided like this:

2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
inode in separate object,5=mds uses versioned encoding,6=dirfrag is
stored in omap,8=no anchor table,9=file layout v2} not writeable with
daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,7=mds uses inline data,8=file layout v2}, killing myself
2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
wanted state up:active
2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
shutting down rank 0

2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
handle_mds_map mdsmap compatset compat={},rocompat={}
,incompat={1=base v0.20,2=client writeable ranges,3=default file
layouts on dirs,4=dir inode in separate object,5=m
ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
table,9=file layout v2} not writeable with daemo
n features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=
dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
is stored in omap,7=mds uses inline data,8=fil
e layout v2}, killing myself
2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
wanted state up:active
2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
shutting down rank 1

The cephfs cluster was down until I updated all MDS's to 12.2.4 --
then they restarted cleanly.

Looks like a pretty serious bug??!!

Cheers, Dan

[1] here is the standby restarting, 4 seconds before the active MDS's suicided:

2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
(unknown), pid 10648
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com