octopus rbd cluster just stopped out of nowhere (>20k slow ops)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi,
maybe someone here can help me to debug an issue we faced today.

Today one of our clusters came to a grinding halt with 2/3 of our OSDs
reporting slow ops.
Only option to get it back to work fast, was to restart all OSDs daemons.

The cluster is an octopus cluster with 150 enterprise SSD OSDs. Last work
on the cluster: synced in a node 4 days ago.

The only health issue, that was reported, was the SLOW_OPS. No slow pings
on the networks. No restarting OSDs. Nothing.

I was able to ping it to a 20s timeframe and I read ALL the logs in a 20
minute timeframe around this issue.

I haven't found any clues.

Maybe someone encountered this in the past?

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux