Re: Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I just had the same issue with our 12.2.4 cluster but not during the upgrade. One of our 3 monitors restarted (the one with a standby MDS) and the 2 others active MDS killed themselves :

2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide. wanted state up:active 2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting down rank 1


2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide. wanted state up:active 2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting down rank 0

I had to restart manually the MDS services to get it works.

Adrien

Le 21/03/2018 à 11:37, Martin Palma a écrit :
Just run into this problem on our production cluster....

It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.

Best,
Martin

On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree <lmb@xxxxxxxx> wrote:
On 2018-03-14T06:57:08, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:

Yes. But the real outcome is not "no MDS [is] active" but "some or all
metadata I/O will pause" -- and there is no avoiding that. During an
MDS upgrade, a standby must take over the MDS being shutdown (and
upgraded).  During takeover, metadata I/O will briefly pause as the
rank is unavailable. (Specifically, no other rank can obtains locks or
communicate with the "failed" rank; so metadata I/O will necessarily
pause until a standby takes over.) Single active vs. multiple active
upgrade makes little difference in this outcome.
Fair, except that there's no standby MDS at this time in case the update
goes wrong.

Is another approach theoretically feasible? Have the updated MDS only go
into the incompatible mode once there's a quorum of new ones available,
or something?
I believe so, yes. That option wasn't explored for this patch because
it was just disambiguating the compatibility flags and the full
side-effects weren't realized.
Would such a patch be accepted if we ended up pursuing this? Any
suggestions on how to best go about this?
It'd be ugly, but you'd have to set it up so that
* new MDSes advertise the old set of required values
* but can identify when all the MDSes are new
* then mark somewhere that they can use the correct values
* then switch to the proper requirements

I don't remember the details of this CompatSet code any more, and it's
definitely made trickier by the MDS having no permanent local state.
Since we do luckily have both the IDs and the strings, you might be
able to do something in the MDSMonitor to identify whether booting
MDSes have "too-old", "old-featureset-but-support-new-feature", or
"new, correct feature advertising" and then either massage that
incoming message down to the "old-featureset-but-support-new-feature"
(if not all the MDSes are new) or do an auto-upgrade of the required
features in the map. And you might also need compatibility code in the
MDS to make sure it sends out the appropriate bits on connection, but
I *think* the CompatSet checks are only done on the monitor and when
an MDS receives an MDSMap.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux