Re: Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

Chris Palmer <chris.palmer@xxxxxxxxx> · Thu, 9 Dec 2021 17:44:12 +0000

Hi Dan & Patrick

Setting that to true using "ceph config" didn't seem to work. I then 
deleted it from there and set it in ceph.conf on node1 and eventually 
after a reboot it started ok. I don't know for sure whether it failing 
using ceph config was real or just a symptom of something else.

I'll do the same (using ceph.conf) on the other nodes now.

Btw, I can't actually see any release notes other than the highlights in 
the earlier posting (and 16.2.7 doesn't show up on the web site list of 
releases yet). Is there anything else that I would need to know?

Thanks for your very fast responses!
Chris

On 09/12/2021 17:10, Dan van der Ster wrote:
On Thu, Dec 9, 2021 at 5:40 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
Hi Chris,

On Thu, Dec 9, 2021 at 10:40 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:
Hi

I've just started an upgrade of a test cluster from 16.2.6 -> 16.2.7 and
immediately hit a problem.

The cluster started as octopus, and has upgraded through to 16.2.6
without any trouble. It is a conventional deployment on Debian 10, NOT
using cephadm. All was clean before the upgrade. It contains nodes as
follows:
- Node 1: MON, MGR, MDS, RGW
- Node 2: MON, MGR, MDS, RGW
- Node 3: MON
- Node 4-6: OSDs

In the absence of any specific upgrade instructions for 16.2.7, I
upgraded Node 1 and rebooted. The MON on that host will now not start,
throwing the following assertion:

2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: /build/ceph-16.2.7/src/mds/FSMap.cc: In function 'void FSMap::sanity(bool) const' thread 7f2d309085c0 time 2021-12-09T14:56:40.098395+0000
2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: /build/ceph-16.2.7/src/mds/FSMap.cc: 868: FAILED ceph_assert(info.compat.writeable(fs->mds_map.compat))
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x7f2d3222423c]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  2: /usr/lib/ceph/libceph-common.so.2(+0x277414) [0x7f2d32224414]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  3: (FSMap::sanity(bool) const+0x2a8) [0x7f2d327331c8]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  4: (MDSMonitor::update_from_paxos(bool*)+0x396) [0x55a32fe6b546]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  5: (PaxosService::refresh(bool*)+0x10a) [0x55a32fd960ca]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  6: (Monitor::refresh_from_paxos(bool*)+0x17c) [0x55a32fc54bec]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  7: (Monitor::init_paxos()+0xfc) [0x55a32fc54e9c]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  8: (Monitor::preinit()+0xbb9) [0x55a32fc7eb09]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  9: main()
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  10: __libc_start_main()
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  11: _start()

ceph health detail merely shows mon01 down, and the 5 crashes before the service stopped auto-restarting.
Please disable mon_mds_skip_sanity in the mons ceph.conf:

[mon]
     mon_mds_skip_sanity = false
Oops, I think you meant   mon_mds_skip_sanity = true

Chris does that allow that mon to startup?

-- dan

The cephadm upgrade sequence is already doing this but I forgot
(sorry!) to mention this is required for manual upgrades in the
release notes.

Please re-enable after the upgrade completes and the cluster is stable.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx