Re: Ceph Pacific mon is not starting after host reboot

Eugen Block <eblock@xxxxxx> · Tue, 25 May 2021 14:15:54 +0000

Hi,

I wanted to explore the stretch mode in pacific (16.2.4) and see how  
it behaves with a DC failure. It seems as if I'm hitting the same or  
at least a similar issue here. To verify if it's the stretch mode I  
removed the cluster and rebuilt it without stretch mode, three hosts  
in three DCs and started to reboot. First I rebooted one node, the  
cluster came back to HEALTH_OK. Then I rebooted two of the three nodes  
and again everything recovered successfully.
Then I rebuilt a 5 node cluster, two DCs in stretch mode with three  
MONs, one being a tiebreaker in a virtual third DC. The stretch rule  
was applied (4 replicas across all 4 nodes).

To test a DC failure I simply shut down two nodes from DC2, although  
the pool's min_size was reduced to 1 by ceph I couldn't read or write  
anything to a mapped rbd, althouh ceph still was responsive with two  
active MONs.
When I booted the other two nodes again the cluster was not able to  
recover, it ends up in a loop of restarting the MON containers (the  
OSDs recover eventually) until systemd shuts them down due to too many  
restarts.
For a couple of seconds I get a ceph status, but I never get all three  
MONs up. When there are two MONs up and I restart the missing one a  
different MON is shut down.

I also see the error message mentioned here in this thread

heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7ff3b3aa5700'  
had timed out after 0.000000000s

I'll add some more information, a stack trace from MON failure:

---snip---
2021-05-25T15:44:26.988562+02:00 pacific1 conmon[5132]: 5  
mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable  
= 1 - now=2021-05-25T13:44:26.730359+0000  
lease_expire=2021-05-25T13:44:30.270907+0000 has v0 lc 9839
2021-05-25T15:44:26.988638+02:00 pacific1 conmon[5132]: debug     -5>  
2021-05-25T13:44:26.726+0000 7ff3b1aa1700  2 mon.pacific1@0(leader)  
e13 send_reply 0x55e37aae3860 0x55e37affa9c0 auth_reply(proto 2 0 (0)  
Success) v1
2021-05-25T15:44:26.988714+02:00 pacific1 conmon[5132]: debug     -4>  
2021-05-25T13:44:26.726+0000 7ff3b1aa1700  5  
mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable  
= 1 - now=2021-05-25T13:44:26.731084+0000  
lease_expire=2021-05-25T13:44:30.270907+0000 has v0 lc 9839
2021-05-25T15:44:26.988790+02:00 pacific1 conmon[5132]: debug     -3>  
2021-05-25T13:44:26.726+0000 7ff3b1aa1700  2 mon.pacific1@0(leader)  
e13 send_reply 0x55e37b14def0 0x55e37ab11ba0 auth_reply(proto 2 0 (0)  
Success) v1
2021-05-25T15:44:26.988929+02:00 pacific1 conmon[5132]: debug     -2>  
2021-05-25T13:44:26.730+0000 7ff3b1aa1700  5  
mon.pacific1@0(leader).osd e117 send_incremental [105..117] to  
client.84146
2021-05-25T15:44:26.989012+02:00 pacific1 conmon[5132]: debug     -1>  
2021-05-25T13:44:26.734+0000 7ff3b1aa1700 -1  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time  
2021-05-25T13:44:26.732857+0000
2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >=  
9)
2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]:  ceph version  
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]:  1:  
(ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x158) [0x7ff3bf61a59c]
2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]:  2:  
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]:  3:  
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned  
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]:  4:  
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,  
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]:  5:  
(OSDMonitor::get_version(unsigned long, unsigned long,  
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]:  6:  
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned  
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]:  7:  
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,  
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]:  8:  
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]:  9:  
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)  
[0x55e3779da402]
2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]:  10:  
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)  
[0x55e377a002ed]
2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]:  11:  
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]:  12:  
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)  
[0x55e377a2ffdc]
2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]:  13:  
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]:  14:  
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]:  15:  
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]:  16: clone()
2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug      0>  
2021-05-25T13:44:26.742+0000 7ff3b1aa1700 -1 *** Caught signal  
(Aborted) **
2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]:  in thread  
7ff3b1aa1700 thread_name:ms_dispatch
2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]:  ceph version  
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]:  1:  
/lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]:  2: gsignal()
2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]:  3: abort()
2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]:  4:  
(ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x1a9) [0x7ff3bf61a5ed]
2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]:  5:  
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]:  6:  
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned  
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]:  7:  
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,  
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]:  8:  
(OSDMonitor::get_version(unsigned long, unsigned long,  
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]:  9:  
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned  
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]:  10:  
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,  
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.991628+02:00 pacific1 conmon[5132]:  11:  
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.991695+02:00 pacific1 conmon[5132]:  12:  
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)  
[0x55e3779da402]
2021-05-25T15:44:26.991761+02:00 pacific1 conmon[5132]:  13:  
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)  
[0x55e377a002ed]
2021-05-25T15:44:26.991827+02:00 pacific1 conmon[5132]:  14:  
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.991893+02:00 pacific1 conmon[5132]:  15:  
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)  
[0x55e377a2ffdc]
2021-05-25T15:44:26.991959+02:00 pacific1 conmon[5132]:  16:  
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.992025+02:00 pacific1 conmon[5132]:  17:  
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.992091+02:00 pacific1 conmon[5132]:  18:  
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.992156+02:00 pacific1 conmon[5132]:  19: clone()
---snip---

I can't tell if this is due to the limited resources in my virtual  
cluster but I figured since the non-stretch mode seems to work as  
expected this could be a problem with the stretch mode. I can provide  
more information if required, just let me know what I can do.

Regards,
Eugen

[1] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/

Zitat von Adrian Nicolae <adrian.nicolae@xxxxxxxxxx>:

Hi guys,

I'm testing Ceph Pacific 16.2.4 in my lab before deciding if I will  
put it in production on a 1PB+ storage cluster with rgw-only access.

I noticed a weird issue with my mons :

- if I reboot a mon host, the ceph-mon container is not starting after reboot

- I can see with 'ceph orch ps' the following output :

mon.node01               node01               running (20h)   4m  
ago     20h   16.2.4     8d91d370c2b8  0a2e86af94b2
mon.node02               node02               running (115m)  12s  
ago    115m  16.2.4     8d91d370c2b8  51f4885a1b06
mon.node03               node03               stopped         4m  
ago     19h   <unknown>  <unknown>     <unknown>

(where node03 is the host which was rebooted).

- I tried to start the mon container manually on node03 with  
'/bin/bash  
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run' and  
I've got the following output :

debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0  
mon.node03@-1(???).osd e408 crush map has features  
3314933069573799936, adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0  
mon.node03@-1(???).osd e408 crush map has features  
432629308056666112, adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0  
mon.node03@-1(???).osd e408 crush map has features  
432629308056666112, adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0  
mon.node03@-1(???).osd e408 crush map has features  
432629308056666112, adjusting msgr requires
cluster 2021-05-23T08:07:12.189243+0000 mgr.node01.ksitls  
(mgr.14164) 36380 : cluster [DBG] pgmap v36392: 417 pgs: 417  
active+clean; 33 KiB data, 605 MiB used, 651 GiB / 652 GiB avail;  
9.6 KiB/s rd, 0 B/s wr, 15 op/s
debug 2021-05-23T08:24:25.196+0000 7f9a9e358700  1  
mon.node03@-1(???).paxosservice(auth 1..51) refresh upgraded, format  
0 -> 3
debug 2021-05-23T08:24:25.208+0000 7f9a88176700  1 heartbeat_map  
reset_timeout 'Monitor::cpu_tp thread 0x7f9a88176700' had timed out  
after 0.000000000s
debug 2021-05-23T08:24:25.208+0000 7f9a9e358700  0  
mon.node03@-1(probing) e5  my rank is now 1 (was -1)
debug 2021-05-23T08:24:25.212+0000 7f9a87975700  0  
mon.node03@1(probing) e6  removed from monmap, suicide.

root@node03:/home/adrian# systemctl status  
ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service
● ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service -  
Ceph mon.node03 for c2d41ac4-baf5-11eb-865d-2dc838a337a3
     Loaded: loaded  
(/etc/systemd/system/ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@.service;  
enabled; vendor preset: enabled)
     Active: inactive (dead) since Sun 2021-05-23 08:10:00 UTC; 16min ago
    Process: 1176 ExecStart=/bin/bash  
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run  
(code=exited, status=0/SUCCESS)
    Process: 1855 ExecStop=/usr/bin/docker stop  
ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3-mon.node03 (code=exited,  
status=1/FAILURE)
    Process: 1861 ExecStopPost=/bin/bash  
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.poststop  
(code=exited, status=0/SUCCESS)
   Main PID: 1176 (code=exited, status=0/SUCCESS)

The only fix I could find was to redeploy the mon with :

ceph orch daemon rm  mon.node03 --force
ceph orch daemon add mon node03

However, even if it's working after redeploy, it's not giving me a  
lot of trust to use it in a production environment having an issue  
like that.  I could reproduce it with 2 different mons so it's not  
just an exception.

My setup is based on Ubuntu 20.04 and docker instead of podman :

root@node01:~# docker -v
Docker version 20.10.6, build 370c289

Do you know a workaround for this issue or is this a known bug ? I  
noticed that there are some other complaints with the same behaviour  
in Octopus as well and the solution at that time was to delete the  
/var/lib/ceph/mon folder .

Thanks.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx