Re: Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 28, 2018 at 9:37 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi all,
>
> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
> OSD's updated fine.
>
> When updating the MDS's (we have 2 active and 1 standby), I started
> with the standby.
>
> At the moment the standby MDS restarted into 12.2.4 [1], both active
> MDSs (still running 12.2.2) suicided like this:
>
> 2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
> handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
> v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
> inode in separate object,5=mds uses versioned encoding,6=dirfrag is
> stored in omap,8=no anchor table,9=file layout v2} not writeable with
> daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in
> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> omap,7=mds uses inline data,8=file layout v2}, killing myself
> 2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
> wanted state up:active
> 2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
> shutting down rank 0
>
>
> 2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
> handle_mds_map mdsmap compatset compat={},rocompat={}
> ,incompat={1=base v0.20,2=client writeable ranges,3=default file
> layouts on dirs,4=dir inode in separate object,5=m
> ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2} not writeable with daemo
> n features compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=
> dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
> is stored in omap,7=mds uses inline data,8=fil
> e layout v2}, killing myself
> 2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
> wanted state up:active
> 2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
> shutting down rank 1

That's not good!

>From looking at the commits between 12.2.2 and 12.2.4, this one looks
suspicious:

commit ddba907279719631903e3a20543056d81d176a1b
Author: Yan, Zheng <zyan@xxxxxxxxxx>
Date:   Tue Oct 31 16:56:51 2017 +0800

    mds: fix MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 definition

    Fixes: http://tracker.ceph.com/issues/21985
    Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx>
    (cherry picked from commit 6c1543dfc55d6db8493535b9b62a30236cf8c638)

John



>
>
> The cephfs cluster was down until I updated all MDS's to 12.2.4 --
> then they restarted cleanly.
>
> Looks like a pretty serious bug??!!
>
> Cheers, Dan
>
>
> [1] here is the standby restarting, 4 seconds before the active MDS's suicided:
>
> 2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
> 2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
> (unknown), pid 10648
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux