Re: Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 9 Dec 2021 19:38:49 +0100

Hi,

Good to know, thanks.

Yes, you need to restart a daemon to undo a change applied via ceph.conf.

You can check exactly which config is currently used and where the
setting comes from using (running directly on the mon host):

ceph daemon mon.`hostname -s` config diff

The mons which had the setting from `ceph config set ...` probably
don't need to be restarted. Check what they're setting via config
diff.

-- dan

On Thu, Dec 9, 2021 at 7:32 PM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:
>
> Hi
>
> Yes, using ceph config is working fine for the rest of the nodes.
>
> Do you know if it is necessary/advisable to restart the MONs after
> removing the mon_mds_skip_sanity setting when the upgrade is complete?
>
> Thanks, Chris
>
> On 09/12/2021 17:51, Dan van der Ster wrote:
> > Hi,
> >
> > On Thu, Dec 9, 2021 at 6:44 PM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:
> >> Hi Dan & Patrick
> >>
> >> Setting that to true using "ceph config" didn't seem to work. I then
> >> deleted it from there and set it in ceph.conf on node1 and eventually
> >> after a reboot it started ok. I don't know for sure whether it failing
> >> using ceph config was real or just a symptom of something else.
> >>
> >> I'll do the same (using ceph.conf) on the other nodes now.
> > Indeed, for a mon that is already asserting, you have confirmed that
> > it needs to be set in ceph.conf (otherwise it asserts before reading
> > the config map).
> >
> > The other approach -- ceph config set mon ... --- should still work in
> > general, provided it is done before the upgrade begins.
> >
> > You can see how cephadm does this here:
> > https://github.com/ceph/ceph/commit/753fd2fb32196d17e186152e7deaef1e0558b781
> >
> >> Btw, I can't actually see any release notes other than the highlights in
> >> the earlier posting (and 16.2.7 doesn't show up on the web site list of
> >> releases yet). Is there anything else that I would need to know?
> > The Release Notes PR is here: https://github.com/ceph/ceph/pull/44131
> > See my comment at the bottom.
> >
> > Thanks for catching this!
> >
> > Cheers, Dan
> >
> >
> >> Thanks for your very fast responses!
> >> Chris
> >>
> >> On 09/12/2021 17:10, Dan van der Ster wrote:
> >>> On Thu, Dec 9, 2021 at 5:40 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
> >>>> Hi Chris,
> >>>>
> >>>> On Thu, Dec 9, 2021 at 10:40 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:
> >>>>> Hi
> >>>>>
> >>>>> I've just started an upgrade of a test cluster from 16.2.6 -> 16.2.7 and
> >>>>> immediately hit a problem.
> >>>>>
> >>>>> The cluster started as octopus, and has upgraded through to 16.2.6
> >>>>> without any trouble. It is a conventional deployment on Debian 10, NOT
> >>>>> using cephadm. All was clean before the upgrade. It contains nodes as
> >>>>> follows:
> >>>>> - Node 1: MON, MGR, MDS, RGW
> >>>>> - Node 2: MON, MGR, MDS, RGW
> >>>>> - Node 3: MON
> >>>>> - Node 4-6: OSDs
> >>>>>
> >>>>> In the absence of any specific upgrade instructions for 16.2.7, I
> >>>>> upgraded Node 1 and rebooted. The MON on that host will now not start,
> >>>>> throwing the following assertion:
> >>>>>
> >>>>> 2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: /build/ceph-16.2.7/src/mds/FSMap.cc: In function 'void FSMap::sanity(bool) const' thread 7f2d309085c0 time 2021-12-09T14:56:40.098395+0000
> >>>>> 2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: /build/ceph-16.2.7/src/mds/FSMap.cc: 868: FAILED ceph_assert(info.compat.writeable(fs->mds_map.compat))
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x7f2d3222423c]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  2: /usr/lib/ceph/libceph-common.so.2(+0x277414) [0x7f2d32224414]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  3: (FSMap::sanity(bool) const+0x2a8) [0x7f2d327331c8]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  4: (MDSMonitor::update_from_paxos(bool*)+0x396) [0x55a32fe6b546]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  5: (PaxosService::refresh(bool*)+0x10a) [0x55a32fd960ca]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  6: (Monitor::refresh_from_paxos(bool*)+0x17c) [0x55a32fc54bec]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  7: (Monitor::init_paxos()+0xfc) [0x55a32fc54e9c]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  8: (Monitor::preinit()+0xbb9) [0x55a32fc7eb09]
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  9: main()
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  10: __libc_start_main()
> >>>>> 2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  11: _start()
> >>>>>
> >>>>> ceph health detail merely shows mon01 down, and the 5 crashes before the service stopped auto-restarting.
> >>>> Please disable mon_mds_skip_sanity in the mons ceph.conf:
> >>>>
> >>>> [mon]
> >>>>       mon_mds_skip_sanity = false
> >>> Oops, I think you meant   mon_mds_skip_sanity = true
> >>>
> >>> Chris does that allow that mon to startup?
> >>>
> >>> -- dan
> >>>
> >>>
> >>>
> >>>> The cephadm upgrade sequence is already doing this but I forgot
> >>>> (sorry!) to mention this is required for manual upgrades in the
> >>>> release notes.
> >>>>
> >>>> Please re-enable after the upgrade completes and the cluster is stable.
> >>>>
> >>>> --
> >>>> Patrick Donnelly, Ph.D.
> >>>> He / Him / His
> >>>> Principal Software Engineer
> >>>> Red Hat, Inc.
> >>>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx