Re: Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Do you have the startup banners for mds.cccephadm14 and 15? It sure
looks like they were running 12.2.2 with the "not writeable with
daemon features" error.

-- dan

On Wed, Mar 28, 2018 at 3:12 PM, adrien.georget@xxxxxxxxxxx
<adrien.georget@xxxxxxxxxxx> wrote:
> Hi,
>
> All Ceph services were in 12.2.4 version.
>
> Adrien
>
>
> Le 28/03/2018 à 14:47, Dan van der Ster a écrit :
>>
>> Hi,
>>
>> Which versions were those MDS's before and after the restarted standby
>> MDS?
>>
>> Cheers, Dan
>>
>>
>>
>> On Wed, Mar 28, 2018 at 11:11 AM, adrien.georget@xxxxxxxxxxx
>> <adrien.georget@xxxxxxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>> I just had the same issue with our 12.2.4 cluster but not during the
>>> upgrade.
>>> One of our 3 monitors restarted (the one with a standby MDS) and the 2
>>> others active MDS killed themselves :
>>>
>>> 2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map
>>> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
>>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
>>> anchor
>>> table,9=file layout v2} not writeable with daemon features
>>> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds
>>> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
>>> data,8=file layout v2}, killing myself
>>> 2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide.
>>> wanted
>>> state up:active
>>> 2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting
>>> down
>>> rank 1
>>>
>>>
>>> 2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map
>>> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
>>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
>>> anchor
>>> table,9=file layout v2} not writeable with daemon features
>>> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds
>>> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
>>> data,8=file layout v2}, killing myself
>>> 2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide.
>>> wanted
>>> state up:active
>>> 2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting
>>> down
>>> rank 0
>>>
>>> I had to restart manually the MDS services to get it works.
>>>
>>> Adrien
>>>
>>>
>>> Le 21/03/2018 à 11:37, Martin Palma a écrit :
>>>>
>>>> Just run into this problem on our production cluster....
>>>>
>>>> It would have been nice if the release notes of 12.2.4 had been
>>>> adapted to inform user about this.
>>>>
>>>> Best,
>>>> Martin
>>>>
>>>> On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree <lmb@xxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> On 2018-03-14T06:57:08, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Yes. But the real outcome is not "no MDS [is] active" but "some or
>>>>>>> all
>>>>>>> metadata I/O will pause" -- and there is no avoiding that. During an
>>>>>>> MDS upgrade, a standby must take over the MDS being shutdown (and
>>>>>>> upgraded).  During takeover, metadata I/O will briefly pause as the
>>>>>>> rank is unavailable. (Specifically, no other rank can obtains locks
>>>>>>> or
>>>>>>> communicate with the "failed" rank; so metadata I/O will necessarily
>>>>>>> pause until a standby takes over.) Single active vs. multiple active
>>>>>>> upgrade makes little difference in this outcome.
>>>>>>
>>>>>> Fair, except that there's no standby MDS at this time in case the
>>>>>> update
>>>>>> goes wrong.
>>>>>>
>>>>>>>> Is another approach theoretically feasible? Have the updated MDS
>>>>>>>> only
>>>>>>>> go
>>>>>>>> into the incompatible mode once there's a quorum of new ones
>>>>>>>> available,
>>>>>>>> or something?
>>>>>>>
>>>>>>> I believe so, yes. That option wasn't explored for this patch because
>>>>>>> it was just disambiguating the compatibility flags and the full
>>>>>>> side-effects weren't realized.
>>>>>>
>>>>>> Would such a patch be accepted if we ended up pursuing this? Any
>>>>>> suggestions on how to best go about this?
>>>>>
>>>>> It'd be ugly, but you'd have to set it up so that
>>>>> * new MDSes advertise the old set of required values
>>>>> * but can identify when all the MDSes are new
>>>>> * then mark somewhere that they can use the correct values
>>>>> * then switch to the proper requirements
>>>>>
>>>>> I don't remember the details of this CompatSet code any more, and it's
>>>>> definitely made trickier by the MDS having no permanent local state.
>>>>> Since we do luckily have both the IDs and the strings, you might be
>>>>> able to do something in the MDSMonitor to identify whether booting
>>>>> MDSes have "too-old", "old-featureset-but-support-new-feature", or
>>>>> "new, correct feature advertising" and then either massage that
>>>>> incoming message down to the "old-featureset-but-support-new-feature"
>>>>> (if not all the MDSes are new) or do an auto-upgrade of the required
>>>>> features in the map. And you might also need compatibility code in the
>>>>> MDS to make sure it sends out the appropriate bits on connection, but
>>>>> I *think* the CompatSet checks are only done on the monitor and when
>>>>> an MDS receives an MDSMap.
>>>>> -Greg
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux