Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



don't know whether you have same problem as us.
We saw several times the osd down after reboot, we fixed by

  304  ls -al /dev/dm-*
  305  chown ceph. /dev/dm-*
  306  for i in `ceph osd tree | grep down | awk '{print $4}' | awk -F'.' '{print $2}'`; do systemctl reset-failed ceph-osd@$i.service && systemctl start ceph-osd@$i.service; done
  307  ls -al /dev/dm-*
  308  ceph osd tree | grep down

> -----Original Message-----
> From: ceph@xxxxxxxxxx <ceph@xxxxxxxxxx>
> Sent: Wednesday, April 8, 2020 5:43 AM
> To: ceph-users@xxxxxxx
> Subject:  Re: Multiple OSDs down, and won't come up (possibly
> related to other Nautilus issues)
> 
> Hello,
> 
> I am just answearing to let you know that there are people around you and
> Seeing your messages.
> 
> Unfortunality i can not help much, but i did have a Strange issue like you where
> few osds went Down or stay up but i couldnt reach the node via ssh.
> 
> The case: we have done updates on our Switch (LACP) and this leads to weird
> pictures for the osds.
> 
> That said,  i guess there could be a network issue in your case.
> 
> I am sorry to not being a big Help.
> Hope your Cluster is in health State now!
> 
> - Mehmet
> 
> Am 2. April 2020 15:32:38 MESZ schrieb aoanla@xxxxxxxxx:
> >So, the recovery stalled a few more OSDs in, but looking at the disks
> >with OSDs marked down, I noticed that, despite systemctl reporting that
> >the OSD processes were all *up*, several of them had not written to
> >their logs since they rotated.
> >
> >Suspecting that these OSDs were stalled, I've started logging into each
> >OSD host and doing:
> >
> >ls -lh /var/log/ceph/*.log
> >
> >checking for logs with a size of 0,
> >
> >and then
> >
> >systemctl restart ceph-osd@xxx
> >
> >for all xxx with zero sized logs.
> >(I've checked each of these first with systemctl status ceph-osd xxx
> >and they all report that the process is up...)
> >
> >This seems to be helping recovery dramatically...
> >
> >but if I look in the logs for each of the "frozen" OSDs before I
> >restart them [obviously, in the rotated log], there's no sign of why
> >the crash actually happens - there's a lot of complaining about how
> >they can't talk to other OSDs as in previous emails in this thread, and
> >then suddenly, nothing.
> >
> >It would be lovely if anyone could comment on thoughts about what's
> >happening here.
> >_______________________________________________
> >ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> >email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to
> ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux