Re: Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 28 Mar 2018 14:47:56 +0200

Hi,

Which versions were those MDS's before and after the restarted standby MDS?

Cheers, Dan

On Wed, Mar 28, 2018 at 11:11 AM, adrien.georget@xxxxxxxxxxx
<adrien.georget@xxxxxxxxxxx> wrote:
> Hi,
>
> I just had the same issue with our 12.2.4 cluster but not during the
> upgrade.
> One of our 3 monitors restarted (the one with a standby MDS) and the 2
> others active MDS killed themselves :
>
> 2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map
> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2} not writeable with daemon features
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> data,8=file layout v2}, killing myself
> 2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide. wanted
> state up:active
> 2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting down
> rank 1
>
>
> 2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map
> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2} not writeable with daemon features
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> data,8=file layout v2}, killing myself
> 2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide. wanted
> state up:active
> 2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting down
> rank 0
>
> I had to restart manually the MDS services to get it works.
>
> Adrien
>
>
> Le 21/03/2018 à 11:37, Martin Palma a écrit :
>>
>> Just run into this problem on our production cluster....
>>
>> It would have been nice if the release notes of 12.2.4 had been
>> adapted to inform user about this.
>>
>> Best,
>> Martin
>>
>> On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>
>> wrote:
>>>
>>> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree <lmb@xxxxxxxx>
>>> wrote:
>>>>
>>>> On 2018-03-14T06:57:08, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>>>>
>>>>> Yes. But the real outcome is not "no MDS [is] active" but "some or all
>>>>> metadata I/O will pause" -- and there is no avoiding that. During an
>>>>> MDS upgrade, a standby must take over the MDS being shutdown (and
>>>>> upgraded).  During takeover, metadata I/O will briefly pause as the
>>>>> rank is unavailable. (Specifically, no other rank can obtains locks or
>>>>> communicate with the "failed" rank; so metadata I/O will necessarily
>>>>> pause until a standby takes over.) Single active vs. multiple active
>>>>> upgrade makes little difference in this outcome.
>>>>
>>>> Fair, except that there's no standby MDS at this time in case the update
>>>> goes wrong.
>>>>
>>>>>> Is another approach theoretically feasible? Have the updated MDS only
>>>>>> go
>>>>>> into the incompatible mode once there's a quorum of new ones
>>>>>> available,
>>>>>> or something?
>>>>>
>>>>> I believe so, yes. That option wasn't explored for this patch because
>>>>> it was just disambiguating the compatibility flags and the full
>>>>> side-effects weren't realized.
>>>>
>>>> Would such a patch be accepted if we ended up pursuing this? Any
>>>> suggestions on how to best go about this?
>>>
>>> It'd be ugly, but you'd have to set it up so that
>>> * new MDSes advertise the old set of required values
>>> * but can identify when all the MDSes are new
>>> * then mark somewhere that they can use the correct values
>>> * then switch to the proper requirements
>>>
>>> I don't remember the details of this CompatSet code any more, and it's
>>> definitely made trickier by the MDS having no permanent local state.
>>> Since we do luckily have both the IDs and the strings, you might be
>>> able to do something in the MDSMonitor to identify whether booting
>>> MDSes have "too-old", "old-featureset-but-support-new-feature", or
>>> "new, correct feature advertising" and then either massage that
>>> incoming message down to the "old-featureset-but-support-new-feature"
>>> (if not all the MDSes are new) or do an auto-upgrade of the required
>>> features in the map. And you might also need compatibility code in the
>>> MDS to make sure it sends out the appropriate bits on connection, but
>>> I *think* the CompatSet checks are only done on the monitor and when
>>> an MDS receives an MDSMap.
>>> -Greg
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com