Hi,
I wanted to explore the stretch mode in pacific (16.2.4) and see how
it behaves with a DC failure. It seems as if I'm hitting the same or
at least a similar issue here. To verify if it's the stretch mode I
removed the cluster and rebuilt it without stretch mode, three hosts
in three DCs and started to reboot. First I rebooted one node, the
cluster came back to HEALTH_OK. Then I rebooted two of the three nodes
and again everything recovered successfully.
Then I rebuilt a 5 node cluster, two DCs in stretch mode with three
MONs, one being a tiebreaker in a virtual third DC. The stretch rule
was applied (4 replicas across all 4 nodes).
To test a DC failure I simply shut down two nodes from DC2, although
the pool's min_size was reduced to 1 by ceph I couldn't read or write
anything to a mapped rbd, althouh ceph still was responsive with two
active MONs.
When I booted the other two nodes again the cluster was not able to
recover, it ends up in a loop of restarting the MON containers (the
OSDs recover eventually) until systemd shuts them down due to too many
restarts.
For a couple of seconds I get a ceph status, but I never get all three
MONs up. When there are two MONs up and I restart the missing one a
different MON is shut down.
I also see the error message mentioned here in this thread
heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7ff3b3aa5700'
had timed out after 0.000000000s
I'll add some more information, a stack trace from MON failure:
---snip---
2021-05-25T15:44:26.988562+02:00 pacific1 conmon[5132]: 5
mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable
= 1 - now=2021-05-25T13:44:26.730359+0000
lease_expire=2021-05-25T13:44:30.270907+0000 has v0 lc 9839
2021-05-25T15:44:26.988638+02:00 pacific1 conmon[5132]: debug -5>
2021-05-25T13:44:26.726+0000 7ff3b1aa1700 2 mon.pacific1@0(leader)
e13 send_reply 0x55e37aae3860 0x55e37affa9c0 auth_reply(proto 2 0 (0)
Success) v1
2021-05-25T15:44:26.988714+02:00 pacific1 conmon[5132]: debug -4>
2021-05-25T13:44:26.726+0000 7ff3b1aa1700 5
mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable
= 1 - now=2021-05-25T13:44:26.731084+0000
lease_expire=2021-05-25T13:44:30.270907+0000 has v0 lc 9839
2021-05-25T15:44:26.988790+02:00 pacific1 conmon[5132]: debug -3>
2021-05-25T13:44:26.726+0000 7ff3b1aa1700 2 mon.pacific1@0(leader)
e13 send_reply 0x55e37b14def0 0x55e37ab11ba0 auth_reply(proto 2 0 (0)
Success) v1
2021-05-25T15:44:26.988929+02:00 pacific1 conmon[5132]: debug -2>
2021-05-25T13:44:26.730+0000 7ff3b1aa1700 5
mon.pacific1@0(leader).osd e117 send_incremental [105..117] to
client.84146
2021-05-25T15:44:26.989012+02:00 pacific1 conmon[5132]: debug -1>
2021-05-25T13:44:26.734+0000 7ff3b1aa1700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time
2021-05-25T13:44:26.732857+0000
2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >=
9)
2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]: ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]: 1:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x158) [0x7ff3bf61a59c]
2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]: 2:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]: 3:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]: 4:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]: 5:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]: 6:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]: 7:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]: 8:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]: 9:
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
[0x55e3779da402]
2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]: 10:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
[0x55e377a002ed]
2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]: 11:
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]: 12:
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
[0x55e377a2ffdc]
2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]: 13:
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]: 14:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]: 15:
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]: 16: clone()
2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug 0>
2021-05-25T13:44:26.742+0000 7ff3b1aa1700 -1 *** Caught signal
(Aborted) **
2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]: in thread
7ff3b1aa1700 thread_name:ms_dispatch
2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]: ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]: 1:
/lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]: 2: gsignal()
2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]: 3: abort()
2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]: 4:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a9) [0x7ff3bf61a5ed]
2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]: 5:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]: 6:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]: 7:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]: 8:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]: 9:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]: 10:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.991628+02:00 pacific1 conmon[5132]: 11:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.991695+02:00 pacific1 conmon[5132]: 12:
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
[0x55e3779da402]
2021-05-25T15:44:26.991761+02:00 pacific1 conmon[5132]: 13:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
[0x55e377a002ed]
2021-05-25T15:44:26.991827+02:00 pacific1 conmon[5132]: 14:
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.991893+02:00 pacific1 conmon[5132]: 15:
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
[0x55e377a2ffdc]
2021-05-25T15:44:26.991959+02:00 pacific1 conmon[5132]: 16:
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.992025+02:00 pacific1 conmon[5132]: 17:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.992091+02:00 pacific1 conmon[5132]: 18:
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.992156+02:00 pacific1 conmon[5132]: 19: clone()
---snip---
I can't tell if this is due to the limited resources in my virtual
cluster but I figured since the non-stretch mode seems to work as
expected this could be a problem with the stretch mode. I can provide
more information if required, just let me know what I can do.
Regards,
Eugen
[1] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/
Zitat von Adrian Nicolae <adrian.nicolae@xxxxxxxxxx>:
Hi guys,
I'm testing Ceph Pacific 16.2.4 in my lab before deciding if I will
put it in production on a 1PB+ storage cluster with rgw-only access.
I noticed a weird issue with my mons :
- if I reboot a mon host, the ceph-mon container is not starting after reboot
- I can see with 'ceph orch ps' the following output :
mon.node01 node01 running (20h) 4m
ago 20h 16.2.4 8d91d370c2b8 0a2e86af94b2
mon.node02 node02 running (115m) 12s
ago 115m 16.2.4 8d91d370c2b8 51f4885a1b06
mon.node03 node03 stopped 4m
ago 19h <unknown> <unknown> <unknown>
(where node03 is the host which was rebooted).
- I tried to start the mon container manually on node03 with
'/bin/bash
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run' and
I've got the following output :
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03@-1(???).osd e408 crush map has features
3314933069573799936, adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03@-1(???).osd e408 crush map has features
432629308056666112, adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03@-1(???).osd e408 crush map has features
432629308056666112, adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03@-1(???).osd e408 crush map has features
432629308056666112, adjusting msgr requires
cluster 2021-05-23T08:07:12.189243+0000 mgr.node01.ksitls
(mgr.14164) 36380 : cluster [DBG] pgmap v36392: 417 pgs: 417
active+clean; 33 KiB data, 605 MiB used, 651 GiB / 652 GiB avail;
9.6 KiB/s rd, 0 B/s wr, 15 op/s
debug 2021-05-23T08:24:25.196+0000 7f9a9e358700 1
mon.node03@-1(???).paxosservice(auth 1..51) refresh upgraded, format
0 -> 3
debug 2021-05-23T08:24:25.208+0000 7f9a88176700 1 heartbeat_map
reset_timeout 'Monitor::cpu_tp thread 0x7f9a88176700' had timed out
after 0.000000000s
debug 2021-05-23T08:24:25.208+0000 7f9a9e358700 0
mon.node03@-1(probing) e5 my rank is now 1 (was -1)
debug 2021-05-23T08:24:25.212+0000 7f9a87975700 0
mon.node03@1(probing) e6 removed from monmap, suicide.
root@node03:/home/adrian# systemctl status
ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service
● ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service -
Ceph mon.node03 for c2d41ac4-baf5-11eb-865d-2dc838a337a3
Loaded: loaded
(/etc/systemd/system/ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@.service;
enabled; vendor preset: enabled)
Active: inactive (dead) since Sun 2021-05-23 08:10:00 UTC; 16min ago
Process: 1176 ExecStart=/bin/bash
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run
(code=exited, status=0/SUCCESS)
Process: 1855 ExecStop=/usr/bin/docker stop
ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3-mon.node03 (code=exited,
status=1/FAILURE)
Process: 1861 ExecStopPost=/bin/bash
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.poststop
(code=exited, status=0/SUCCESS)
Main PID: 1176 (code=exited, status=0/SUCCESS)
The only fix I could find was to redeploy the mon with :
ceph orch daemon rm mon.node03 --force
ceph orch daemon add mon node03
However, even if it's working after redeploy, it's not giving me a
lot of trust to use it in a production environment having an issue
like that. I could reproduce it with 2 different mons so it's not
just an exception.
My setup is based on Ubuntu 20.04 and docker instead of podman :
root@node01:~# docker -v
Docker version 20.10.6, build 370c289
Do you know a workaround for this issue or is this a known bug ? I
noticed that there are some other complaints with the same behaviour
in Octopus as well and the solution at that time was to delete the
/var/lib/ceph/mon folder .
Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx