Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

ceph@xxxxxxxxxx · Tue, 07 Apr 2020 23:42:37 +0200

Hello,

I am just answearing to let you know that there are people around you and Seeing your messages.

Unfortunality i can not help much, but i did have a Strange issue like you where few osds went Down or stay up but i couldnt reach the node via ssh.

The case: we have done updates on our Switch (LACP) and this leads to weird pictures for the osds.

That said,  i guess there could be a network issue in your case.

I am sorry to not being a big Help.
Hope your Cluster is in health State now!

- Mehmet

Am 2. April 2020 15:32:38 MESZ schrieb aoanla@xxxxxxxxx:
>So, the recovery stalled a few more OSDs in, but looking at the disks
>with OSDs marked down, I noticed that, despite systemctl reporting that
>the OSD processes were all *up*, several of them had not written to
>their logs since they rotated.
>
>Suspecting that these OSDs were stalled, I've started logging into each
>OSD host and doing:
>
>ls -lh /var/log/ceph/*.log
>
>checking for logs with a size of 0, 
>
>and then 
>
>systemctl restart ceph-osd@xxx 
>
>for all xxx with zero sized logs. 
>(I've checked each of these first with 
>systemctl status ceph-osd xxx 
>and they all report that the process is up...)
>
>This seems to be helping recovery dramatically...
>
>but if I look in the logs for each of the "frozen" OSDs before I
>restart them [obviously, in the rotated log], there's no sign of why
>the crash actually happens - there's a lot of complaining about how
>they can't talk to other OSDs as in previous emails in this thread, and
>then suddenly, nothing.
>
>It would be lovely if anyone could comment on thoughts about what's
>happening here.
>_______________________________________________
>ceph-users mailing list -- ceph-users@xxxxxxx
>To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx