Hello, I am just answearing to let you know that there are people around you and Seeing your messages. Unfortunality i can not help much, but i did have a Strange issue like you where few osds went Down or stay up but i couldnt reach the node via ssh. The case: we have done updates on our Switch (LACP) and this leads to weird pictures for the osds. That said, i guess there could be a network issue in your case. I am sorry to not being a big Help. Hope your Cluster is in health State now! - Mehmet Am 2. April 2020 15:32:38 MESZ schrieb aoanla@xxxxxxxxx: >So, the recovery stalled a few more OSDs in, but looking at the disks >with OSDs marked down, I noticed that, despite systemctl reporting that >the OSD processes were all *up*, several of them had not written to >their logs since they rotated. > >Suspecting that these OSDs were stalled, I've started logging into each >OSD host and doing: > >ls -lh /var/log/ceph/*.log > >checking for logs with a size of 0, > >and then > >systemctl restart ceph-osd@xxx > >for all xxx with zero sized logs. >(I've checked each of these first with >systemctl status ceph-osd xxx >and they all report that the process is up...) > >This seems to be helping recovery dramatically... > >but if I look in the logs for each of the "frozen" OSDs before I >restart them [obviously, in the rotated log], there's no sign of why >the crash actually happens - there's a lot of complaining about how >they can't talk to other OSDs as in previous emails in this thread, and >then suddenly, nothing. > >It would be lovely if anyone could comment on thoughts about what's >happening here. >_______________________________________________ >ceph-users mailing list -- ceph-users@xxxxxxx >To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx