The logs you probably really want to look at here are the journal logs from the mgr and mon. If you have a copy of the cephadm tool on the host, you can do a "cephadm ls --no-detail | grep systemd" to list out the systemd unit names for the ceph daemons on the host, or just look find the systemd unit names in the standard way you would for any other systemd unit (e.g. "systemctl -l | grep mgr'' will probably include the mgr one) and then take a look at "journalctl -eu <systemd-unit-name>" for the systemd unit for both the mgr and the mon. I'd expect near the end of the log it would include a reason for going down. As for the debug_ms (I think that's what you want over "debug mon") stuff, I think that would need to be a command line option for the mgr/mon process. For cephadm deployments, the systemd unit is run through a "unit.run" file in /var/lib/ceph/<cluster-fsid>/<daemon-name>/unit.run. If you go to the very end of that file, which will be a very long podman or docker run command, add in the "--debug_ms 20" and then restart the systemd unit for that daemon, it should cause the extra debug logging to happen from that daemon. I would say first check if there are useful errors in the journal logs mentioned above before trying that though. On Mon, Jul 24, 2023 at 9:47 AM Renata Callado Borges < renato.callado@xxxxxxxxxxxx> wrote: > Dear all, > > > How are you? > > I have a cluster on Pacific with 3 hosts, each one with 1 mon, 1 mgr > and 12 OSDs. > > One of the hosts, darkside1, has been out of quorum according to ceph > status. > > Systemd showed 4 services dead, two mons and two mgrs. > > I managed to systemctl restart one mon and one mgr, but even after > several attempts, the remaining mon and mgr services, when asked to > restart, keep returning to a failed state after a few seconds. They try > to auto-restart and then go into a failed state where systemd requires > me to manually set them to "reset-failed" before trying to start again. > But they never stay up. There are no clear messages about the issue in > /var/log/ceph/cephadm.log. > > The host is still out of quorum. > > > I have failed to "turn on debug" as per > https://docs.ceph.com/en/pacific/rados/troubleshooting/log-and-debug/. > It seems I do not know the proper incantantion for "ceph daemon X config > show", no string for X seems to satisfy this command. I have tried > adding this: > > [mon] > > debug mon = 20 > > > To my ceph.conf, but no additional lines of log are sent to > /var/log/cephadm.log > > > so I'm sorry I can´t provide more details. > > > Could someone help me debug this situation? I am sure that if just > reboot the machine, it will start up the services properly, as it always > has done, but I would prefer to fix this without this action. > > > Cordially, > > Renata. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx