Hi, On a daily basis, one of my monitors goes down [root@cube ~]# ceph health detail HEALTH_WARN 1 failed cephadm daemon(s); 1/3 mons down, quorum rhel1.robeckert.us,story [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon mon.cube on cube.robeckert.us is in error state [WRN] MON_DOWN: 1/3 mons down, quorum rhel1.robeckert.us,story mon.cube (rank 2) addr [v2:192.168.2.142:3300/0,v1:192.168.2.142:6789/0] is down (out of quorum) [root@cube ~]# ceph --version ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable) I have a script that will copy the mon data from another server and it restarts and runs well for a while. It is always the same monitor, and when I look at the logs the only thing I really see is the cephadm log showing it down 2021-04-28 10:07:26,173 DEBUG Running command: /usr/bin/podman --version 2021-04-28 10:07:26,217 DEBUG /usr/bin/podman: stdout podman version 2.2.1 2021-04-28 10:07:26,222 DEBUG Running command: /usr/bin/podman inspect --format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index .Config.Labels "io.ceph.version"}} ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-osd.2 2021-04-28 10:07:26,326 DEBUG /usr/bin/podman: stdout fab17e5242eb4875e266df19ca89b596a2f2b1d470273a99ff71da2ae81eeb3c,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-26 17:13:15.54183375 -0400 EDT, 2021-04-28 10:07:26,328 DEBUG Running command: systemctl is-enabled ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867@xxxxxxxx<mailto:ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867@xxxxxxxx> 2021-04-28 10:07:26,334 DEBUG systemctl: stdout enabled 2021-04-28 10:07:26,335 DEBUG Running command: systemctl is-active ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867@xxxxxxxx<mailto:ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867@xxxxxxxx> 2021-04-28 10:07:26,340 DEBUG systemctl: stdout failed 2021-04-28 10:07:26,340 DEBUG Running command: /usr/bin/podman --version 2021-04-28 10:07:26,395 DEBUG /usr/bin/podman: stdout podman version 2.2.1 2021-04-28 10:07:26,402 DEBUG Running command: /usr/bin/podman inspect --format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index .Config.Labels "io.ceph.version"}} ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-mon.cube 2021-04-28 10:07:26,526 DEBUG /usr/bin/podman: stdout 04e7c673cbacf5160427b0c3eb2f0948b2f15d02c58bd1d9dd14f975a84cfc6f,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-28 08:54:57.614847512 -0400 EDT, I don't know if it matters, but this server is an AMD 3600XT while my other two servers which have had no issues are intel based. The root file system was originally on a SSD, and I switched to NVME, so I eliminated controller or drive issues. (I didn't see anything in dmesg anyway) If someone could point me in the right direction on where to troubleshoot next, I would appreciate it. Thanks, Rob Eckert _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx