Hi,
I just had the same issue with our 12.2.4 cluster but not during the
upgrade.
One of our 3 monitors restarted (the one with a standby MDS) and the 2
others active MDS killed themselves :
2018-03-28 09:36:24.376888 7f910bc0f700 0 mds.cccephadm14
handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
inode in separate object,5=mds uses versioned encoding,6=dirfrag is
stored in omap,8=no anchor table,9=file layout v2} not writeable with
daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds
uses inline data,8=file layout v2}, killing myself
2018-03-28 09:36:24.376903 7f910bc0f700 1 mds.cccephadm14 suicide.
wanted state up:active
2018-03-28 09:36:25.379607 7f910bc0f700 1 mds.1.62 shutdown: shutting
down rank 1
2018-03-28 09:36:24.375867 7fad455bf700 0 mds.cccephadm15
handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
inode in separate object,5=mds uses versioned encoding,6=dirfrag is
stored in omap,8=no anchor table,9=file layout v2} not writeable with
daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds
uses inline data,8=file layout v2}, killing myself
2018-03-28 09:36:24.375883 7fad455bf700 1 mds.cccephadm15 suicide.
wanted state up:active
2018-03-28 09:36:25.377633 7fad455bf700 1 mds.0.50 shutdown: shutting
down rank 0
I had to restart manually the MDS services to get it works.
Adrien
Le 21/03/2018 à 11:37, Martin Palma a écrit :
Just run into this problem on our production cluster....
It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.
Best,
Martin
On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree <lmb@xxxxxxxx> wrote:
On 2018-03-14T06:57:08, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
Yes. But the real outcome is not "no MDS [is] active" but "some or all
metadata I/O will pause" -- and there is no avoiding that. During an
MDS upgrade, a standby must take over the MDS being shutdown (and
upgraded). During takeover, metadata I/O will briefly pause as the
rank is unavailable. (Specifically, no other rank can obtains locks or
communicate with the "failed" rank; so metadata I/O will necessarily
pause until a standby takes over.) Single active vs. multiple active
upgrade makes little difference in this outcome.
Fair, except that there's no standby MDS at this time in case the update
goes wrong.
Is another approach theoretically feasible? Have the updated MDS only go
into the incompatible mode once there's a quorum of new ones available,
or something?
I believe so, yes. That option wasn't explored for this patch because
it was just disambiguating the compatibility flags and the full
side-effects weren't realized.
Would such a patch be accepted if we ended up pursuing this? Any
suggestions on how to best go about this?
It'd be ugly, but you'd have to set it up so that
* new MDSes advertise the old set of required values
* but can identify when all the MDSes are new
* then mark somewhere that they can use the correct values
* then switch to the proper requirements
I don't remember the details of this CompatSet code any more, and it's
definitely made trickier by the MDS having no permanent local state.
Since we do luckily have both the IDs and the strings, you might be
able to do something in the MDSMonitor to identify whether booting
MDSes have "too-old", "old-featureset-but-support-new-feature", or
"new, correct feature advertising" and then either massage that
incoming message down to the "old-featureset-but-support-new-feature"
(if not all the MDSes are new) or do an auto-upgrade of the required
features in the map. And you might also need compatibility code in the
MDS to make sure it sends out the appropriate bits on connection, but
I *think* the CompatSet checks are only done on the monitor and when
an MDS receives an MDSMap.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com