Re: Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just run into this problem on our production cluster....

It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.

Best,
Martin

On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree <lmb@xxxxxxxx> wrote:
>> On 2018-03-14T06:57:08, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>>
>>> Yes. But the real outcome is not "no MDS [is] active" but "some or all
>>> metadata I/O will pause" -- and there is no avoiding that. During an
>>> MDS upgrade, a standby must take over the MDS being shutdown (and
>>> upgraded).  During takeover, metadata I/O will briefly pause as the
>>> rank is unavailable. (Specifically, no other rank can obtains locks or
>>> communicate with the "failed" rank; so metadata I/O will necessarily
>>> pause until a standby takes over.) Single active vs. multiple active
>>> upgrade makes little difference in this outcome.
>>
>> Fair, except that there's no standby MDS at this time in case the update
>> goes wrong.
>>
>>> > Is another approach theoretically feasible? Have the updated MDS only go
>>> > into the incompatible mode once there's a quorum of new ones available,
>>> > or something?
>>> I believe so, yes. That option wasn't explored for this patch because
>>> it was just disambiguating the compatibility flags and the full
>>> side-effects weren't realized.
>>
>> Would such a patch be accepted if we ended up pursuing this? Any
>> suggestions on how to best go about this?
>
> It'd be ugly, but you'd have to set it up so that
> * new MDSes advertise the old set of required values
> * but can identify when all the MDSes are new
> * then mark somewhere that they can use the correct values
> * then switch to the proper requirements
>
> I don't remember the details of this CompatSet code any more, and it's
> definitely made trickier by the MDS having no permanent local state.
> Since we do luckily have both the IDs and the strings, you might be
> able to do something in the MDSMonitor to identify whether booting
> MDSes have "too-old", "old-featureset-but-support-new-feature", or
> "new, correct feature advertising" and then either massage that
> incoming message down to the "old-featureset-but-support-new-feature"
> (if not all the MDSes are new) or do an auto-upgrade of the required
> features in the map. And you might also need compatibility code in the
> MDS to make sure it sends out the appropriate bits on connection, but
> I *think* the CompatSet checks are only done on the monitor and when
> an MDS receives an MDSMap.
> -Greg
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux