Hi
Yes, using ceph config is working fine for the rest of the nodes.
Do you know if it is necessary/advisable to restart the MONs after
removing the mon_mds_skip_sanity setting when the upgrade is complete?
Thanks, Chris
On 09/12/2021 17:51, Dan van der Ster wrote:
Hi,
On Thu, Dec 9, 2021 at 6:44 PM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:
Hi Dan & Patrick
Setting that to true using "ceph config" didn't seem to work. I then
deleted it from there and set it in ceph.conf on node1 and eventually
after a reboot it started ok. I don't know for sure whether it failing
using ceph config was real or just a symptom of something else.
I'll do the same (using ceph.conf) on the other nodes now.
Indeed, for a mon that is already asserting, you have confirmed that
it needs to be set in ceph.conf (otherwise it asserts before reading
the config map).
The other approach -- ceph config set mon ... --- should still work in
general, provided it is done before the upgrade begins.
You can see how cephadm does this here:
https://github.com/ceph/ceph/commit/753fd2fb32196d17e186152e7deaef1e0558b781
Btw, I can't actually see any release notes other than the highlights in
the earlier posting (and 16.2.7 doesn't show up on the web site list of
releases yet). Is there anything else that I would need to know?
The Release Notes PR is here: https://github.com/ceph/ceph/pull/44131
See my comment at the bottom.
Thanks for catching this!
Cheers, Dan
Thanks for your very fast responses!
Chris
On 09/12/2021 17:10, Dan van der Ster wrote:
On Thu, Dec 9, 2021 at 5:40 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
Hi Chris,
On Thu, Dec 9, 2021 at 10:40 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:
Hi
I've just started an upgrade of a test cluster from 16.2.6 -> 16.2.7 and
immediately hit a problem.
The cluster started as octopus, and has upgraded through to 16.2.6
without any trouble. It is a conventional deployment on Debian 10, NOT
using cephadm. All was clean before the upgrade. It contains nodes as
follows:
- Node 1: MON, MGR, MDS, RGW
- Node 2: MON, MGR, MDS, RGW
- Node 3: MON
- Node 4-6: OSDs
In the absence of any specific upgrade instructions for 16.2.7, I
upgraded Node 1 and rebooted. The MON on that host will now not start,
throwing the following assertion:
2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: /build/ceph-16.2.7/src/mds/FSMap.cc: In function 'void FSMap::sanity(bool) const' thread 7f2d309085c0 time 2021-12-09T14:56:40.098395+0000
2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: /build/ceph-16.2.7/src/mds/FSMap.cc: 868: FAILED ceph_assert(info.compat.writeable(fs->mds_map.compat))
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x7f2d3222423c]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 2: /usr/lib/ceph/libceph-common.so.2(+0x277414) [0x7f2d32224414]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 3: (FSMap::sanity(bool) const+0x2a8) [0x7f2d327331c8]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 4: (MDSMonitor::update_from_paxos(bool*)+0x396) [0x55a32fe6b546]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 5: (PaxosService::refresh(bool*)+0x10a) [0x55a32fd960ca]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 6: (Monitor::refresh_from_paxos(bool*)+0x17c) [0x55a32fc54bec]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 7: (Monitor::init_paxos()+0xfc) [0x55a32fc54e9c]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 8: (Monitor::preinit()+0xbb9) [0x55a32fc7eb09]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 9: main()
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 10: __libc_start_main()
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]: 11: _start()
ceph health detail merely shows mon01 down, and the 5 crashes before the service stopped auto-restarting.
Please disable mon_mds_skip_sanity in the mons ceph.conf:
[mon]
mon_mds_skip_sanity = false
Oops, I think you meant mon_mds_skip_sanity = true
Chris does that allow that mon to startup?
-- dan
The cephadm upgrade sequence is already doing this but I forgot
(sorry!) to mention this is required for manual upgrades in the
release notes.
Please re-enable after the upgrade completes and the cluster is stable.
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx