Re: Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hmm looks like I restarted everything except MDS...
So it's the same issue! That's why the MDS kill themselves during the reboot of one of the monitors with MDS in 12.2.2.

Thanks Dan!

Adrien

Le 28/03/2018 à 16:43, Dan van der Ster a écrit :
Do you have the startup banners for mds.cccephadm14 and 15? It sure
looks like they were running 12.2.2 with the "not writeable with
daemon features" error.

-- dan

On Wed, Mar 28, 2018 at 3:12 PM, adrien.georget@xxxxxxxxxxx
<adrien.georget@xxxxxxxxxxx> wrote:
Hi,

All Ceph services were in 12.2.4 version.

Adrien


Le 28/03/2018 à 14:47, Dan van der Ster a écrit :
Hi,

Which versions were those MDS's before and after the restarted standby
MDS?

Cheers, Dan



On Wed, Mar 28, 2018 at 11:11 AM, adrien.georget@xxxxxxxxxxx
<adrien.georget@xxxxxxxxxxx> wrote:
Hi,

I just had the same issue with our 12.2.4 cluster but not during the
upgrade.
One of our 3 monitors restarted (the one with a standby MDS) and the 2
others active MDS killed themselves :

2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map
mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor
table,9=file layout v2} not writeable with daemon features
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
data,8=file layout v2}, killing myself
2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide.
wanted
state up:active
2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting
down
rank 1


2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map
mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor
table,9=file layout v2} not writeable with daemon features
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
data,8=file layout v2}, killing myself
2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide.
wanted
state up:active
2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting
down
rank 0

I had to restart manually the MDS services to get it works.

Adrien


Le 21/03/2018 à 11:37, Martin Palma a écrit :
Just run into this problem on our production cluster....

It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.

Best,
Martin

On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>
wrote:
On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree <lmb@xxxxxxxx>
wrote:
On 2018-03-14T06:57:08, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:

Yes. But the real outcome is not "no MDS [is] active" but "some or
all
metadata I/O will pause" -- and there is no avoiding that. During an
MDS upgrade, a standby must take over the MDS being shutdown (and
upgraded).  During takeover, metadata I/O will briefly pause as the
rank is unavailable. (Specifically, no other rank can obtains locks
or
communicate with the "failed" rank; so metadata I/O will necessarily
pause until a standby takes over.) Single active vs. multiple active
upgrade makes little difference in this outcome.
Fair, except that there's no standby MDS at this time in case the
update
goes wrong.

Is another approach theoretically feasible? Have the updated MDS
only
go
into the incompatible mode once there's a quorum of new ones
available,
or something?
I believe so, yes. That option wasn't explored for this patch because
it was just disambiguating the compatibility flags and the full
side-effects weren't realized.
Would such a patch be accepted if we ended up pursuing this? Any
suggestions on how to best go about this?
It'd be ugly, but you'd have to set it up so that
* new MDSes advertise the old set of required values
* but can identify when all the MDSes are new
* then mark somewhere that they can use the correct values
* then switch to the proper requirements

I don't remember the details of this CompatSet code any more, and it's
definitely made trickier by the MDS having no permanent local state.
Since we do luckily have both the IDs and the strings, you might be
able to do something in the MDSMonitor to identify whether booting
MDSes have "too-old", "old-featureset-but-support-new-feature", or
"new, correct feature advertising" and then either massage that
incoming message down to the "old-featureset-but-support-new-feature"
(if not all the MDSes are new) or do an auto-upgrade of the required
features in the map. And you might also need compatibility code in the
MDS to make sure it sends out the appropriate bits on connection, but
I *think* the CompatSet checks are only done on the monitor and when
an MDS receives an MDSMap.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux