Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sa, 2022-12-03 at 01:54 +0100, Boris Behrens wrote:
> hi,
> maybe someone here can help me to debug an issue we faced today.
> 
> Today one of our clusters came to a grinding halt with 2/3 of our OSDs
> reporting slow ops.
> Only option to get it back to work fast, was to restart all OSDs daemons.
> 
> The cluster is an octopus cluster with 150 enterprise SSD OSDs. Last work
> on the cluster: synced in a node 4 days ago.
> 
> The only health issue, that was reported, was the SLOW_OPS. No slow pings
> on the networks. No restarting OSDs. Nothing.
> 
> I was able to ping it to a 20s timeframe and I read ALL the logs in a 20
> minute timeframe around this issue.
> 
> I haven't found any clues.
> 
> Maybe someone encountered this in the past?

do you happen to run your rocksdb on a dedicated caching device (nvme ssd)?

I observed slow ops in octopus after a faulty nvme ssd was inserted in one ceph server.
as was said in other mails, try to isolate your root cause.

maybe the node added 4 days ago was the culprit here?

we were able to pinpoint the nvme by monitoring the slow osds
and the commonality in this case was the same nvme cache device.

you should always benchmark new hardware/perform burn-in tests imho, which
is not always possible due to environment constraints.

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske
Systementwickler / systems engineer
 
 
Mittwald CM Service GmbH & Co. KG
Königsberger Straße 4-6
32339 Espelkamp
 
Tel.: 05772 / 293-900
Fax: 05772 / 293-333
 
https://www.mittwald.de
 
Geschäftsführer: Robert Meyer, Florian Jürgens
 
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit 
gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux