don't know whether you have same problem as us. We saw several times the osd down after reboot, we fixed by 304 ls -al /dev/dm-* 305 chown ceph. /dev/dm-* 306 for i in `ceph osd tree | grep down | awk '{print $4}' | awk -F'.' '{print $2}'`; do systemctl reset-failed ceph-osd@$i.service && systemctl start ceph-osd@$i.service; done 307 ls -al /dev/dm-* 308 ceph osd tree | grep down > -----Original Message----- > From: ceph@xxxxxxxxxx <ceph@xxxxxxxxxx> > Sent: Wednesday, April 8, 2020 5:43 AM > To: ceph-users@xxxxxxx > Subject: Re: Multiple OSDs down, and won't come up (possibly > related to other Nautilus issues) > > Hello, > > I am just answearing to let you know that there are people around you and > Seeing your messages. > > Unfortunality i can not help much, but i did have a Strange issue like you where > few osds went Down or stay up but i couldnt reach the node via ssh. > > The case: we have done updates on our Switch (LACP) and this leads to weird > pictures for the osds. > > That said, i guess there could be a network issue in your case. > > I am sorry to not being a big Help. > Hope your Cluster is in health State now! > > - Mehmet > > Am 2. April 2020 15:32:38 MESZ schrieb aoanla@xxxxxxxxx: > >So, the recovery stalled a few more OSDs in, but looking at the disks > >with OSDs marked down, I noticed that, despite systemctl reporting that > >the OSD processes were all *up*, several of them had not written to > >their logs since they rotated. > > > >Suspecting that these OSDs were stalled, I've started logging into each > >OSD host and doing: > > > >ls -lh /var/log/ceph/*.log > > > >checking for logs with a size of 0, > > > >and then > > > >systemctl restart ceph-osd@xxx > > > >for all xxx with zero sized logs. > >(I've checked each of these first with systemctl status ceph-osd xxx > >and they all report that the process is up...) > > > >This seems to be helping recovery dramatically... > > > >but if I look in the logs for each of the "frozen" OSDs before I > >restart them [obviously, in the rotated log], there's no sign of why > >the crash actually happens - there's a lot of complaining about how > >they can't talk to other OSDs as in previous emails in this thread, and > >then suddenly, nothing. > > > >It would be lovely if anyone could comment on thoughts about what's > >happening here. > >_______________________________________________ > >ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > >email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to > ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx