Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



@Alex:
the issue is done for now, but I fear it might come back sometime. The
cluster was running fine for months.
I check if we can restart the switches easily. Host reboots should also be
no problem.

There is no "implicated OSD" message in the logs.
All OSDs were recreated 3 months ago. (sync out, destroy, wipe, create,
sync in). Maybe I will reinstall with ubuntu 20.04 (currently centos7) for
newer kernel.

Am So., 4. Dez. 2022 um 19:58 Uhr schrieb Alex Gorbachev <
ag@xxxxxxxxxxxxxxxxxxx>:

> Hi Boris,
>
> These waits seem to be all over the place.  Usually, in the main ceph.log
> you see "implicated OSD" messages - I would try to find some commonality
> with either a host, switch, or something like that.  Can be bad ports/NICs,
> LACP problems, even bad cables sometimes.  I try to isolate an area that is
> problematic.  Sometimes rebooting OSD hosts one at a time.  Rebooting
> switches (if stacked/MLAG) one at a time.  Something has got to be there,
> which makes the problem go away.
> --
> Alex Gorbachev
> https://alextelescope.blogspot.com
>
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux