Dear all,
How are you?
I have a cluster on Pacific with 3 hosts, each one with 1 mon, 1 mgr
and 12 OSDs.
One of the hosts, darkside1, has been out of quorum according to ceph
status.
Systemd showed 4 services dead, two mons and two mgrs.
I managed to systemctl restart one mon and one mgr, but even after
several attempts, the remaining mon and mgr services, when asked to
restart, keep returning to a failed state after a few seconds. They try
to auto-restart and then go into a failed state where systemd requires
me to manually set them to "reset-failed" before trying to start again.
But they never stay up. There are no clear messages about the issue in
/var/log/ceph/cephadm.log.
The host is still out of quorum.
I have failed to "turn on debug" as per
https://docs.ceph.com/en/pacific/rados/troubleshooting/log-and-debug/.
It seems I do not know the proper incantantion for "ceph daemon X config
show", no string for X seems to satisfy this command. I have tried
adding this:
[mon]
debug mon = 20
To my ceph.conf, but no additional lines of log are sent to
/var/log/cephadm.log
so I'm sorry I can´t provide more details.
Could someone help me debug this situation? I am sure that if just
reboot the machine, it will start up the services properly, as it always
has done, but I would prefer to fix this without this action.
Cordially,
Renata.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx