Hi, Which versions were those MDS's before and after the restarted standby MDS? Cheers, Dan On Wed, Mar 28, 2018 at 11:11 AM, adrien.georget@xxxxxxxxxxx <adrien.georget@xxxxxxxxxxx> wrote: > Hi, > > I just had the same issue with our 12.2.4 cluster but not during the > upgrade. > One of our 3 monitors restarted (the one with a standby MDS) and the 2 > others active MDS killed themselves : > > 2018-03-28 09:36:24.376888 7f910bc0f700 0 mds.cccephadm14 handle_mds_map > mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor > table,9=file layout v2} not writeable with daemon features > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data,8=file layout v2}, killing myself > 2018-03-28 09:36:24.376903 7f910bc0f700 1 mds.cccephadm14 suicide. wanted > state up:active > 2018-03-28 09:36:25.379607 7f910bc0f700 1 mds.1.62 shutdown: shutting down > rank 1 > > > 2018-03-28 09:36:24.375867 7fad455bf700 0 mds.cccephadm15 handle_mds_map > mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor > table,9=file layout v2} not writeable with daemon features > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data,8=file layout v2}, killing myself > 2018-03-28 09:36:24.375883 7fad455bf700 1 mds.cccephadm15 suicide. wanted > state up:active > 2018-03-28 09:36:25.377633 7fad455bf700 1 mds.0.50 shutdown: shutting down > rank 0 > > I had to restart manually the MDS services to get it works. > > Adrien > > > Le 21/03/2018 à 11:37, Martin Palma a écrit : >> >> Just run into this problem on our production cluster.... >> >> It would have been nice if the release notes of 12.2.4 had been >> adapted to inform user about this. >> >> Best, >> Martin >> >> On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> >> wrote: >>> >>> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree <lmb@xxxxxxxx> >>> wrote: >>>> >>>> On 2018-03-14T06:57:08, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: >>>> >>>>> Yes. But the real outcome is not "no MDS [is] active" but "some or all >>>>> metadata I/O will pause" -- and there is no avoiding that. During an >>>>> MDS upgrade, a standby must take over the MDS being shutdown (and >>>>> upgraded). During takeover, metadata I/O will briefly pause as the >>>>> rank is unavailable. (Specifically, no other rank can obtains locks or >>>>> communicate with the "failed" rank; so metadata I/O will necessarily >>>>> pause until a standby takes over.) Single active vs. multiple active >>>>> upgrade makes little difference in this outcome. >>>> >>>> Fair, except that there's no standby MDS at this time in case the update >>>> goes wrong. >>>> >>>>>> Is another approach theoretically feasible? Have the updated MDS only >>>>>> go >>>>>> into the incompatible mode once there's a quorum of new ones >>>>>> available, >>>>>> or something? >>>>> >>>>> I believe so, yes. That option wasn't explored for this patch because >>>>> it was just disambiguating the compatibility flags and the full >>>>> side-effects weren't realized. >>>> >>>> Would such a patch be accepted if we ended up pursuing this? Any >>>> suggestions on how to best go about this? >>> >>> It'd be ugly, but you'd have to set it up so that >>> * new MDSes advertise the old set of required values >>> * but can identify when all the MDSes are new >>> * then mark somewhere that they can use the correct values >>> * then switch to the proper requirements >>> >>> I don't remember the details of this CompatSet code any more, and it's >>> definitely made trickier by the MDS having no permanent local state. >>> Since we do luckily have both the IDs and the strings, you might be >>> able to do something in the MDSMonitor to identify whether booting >>> MDSes have "too-old", "old-featureset-but-support-new-feature", or >>> "new, correct feature advertising" and then either massage that >>> incoming message down to the "old-featureset-but-support-new-feature" >>> (if not all the MDSes are new) or do an auto-upgrade of the required >>> features in the map. And you might also need compatibility code in the >>> MDS to make sure it sends out the appropriate bits on connection, but >>> I *think* the CompatSet checks are only done on the monitor and when >>> an MDS receives an MDSMap. >>> -Greg >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com