Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

Sven Kieske <S.Kieske@xxxxxxxxxxx> · Mon, 5 Dec 2022 13:24:18 +0000

On Sa, 2022-12-03 at 01:54 +0100, Boris Behrens wrote:
> hi,
> maybe someone here can help me to debug an issue we faced today.
> 
> Today one of our clusters came to a grinding halt with 2/3 of our OSDs
> reporting slow ops.
> Only option to get it back to work fast, was to restart all OSDs daemons.
> 
> The cluster is an octopus cluster with 150 enterprise SSD OSDs. Last work
> on the cluster: synced in a node 4 days ago.
> 
> The only health issue, that was reported, was the SLOW_OPS. No slow pings
> on the networks. No restarting OSDs. Nothing.
> 
> I was able to ping it to a 20s timeframe and I read ALL the logs in a 20
> minute timeframe around this issue.
> 
> I haven't found any clues.
> 
> Maybe someone encountered this in the past?

do you happen to run your rocksdb on a dedicated caching device (nvme ssd)?

I observed slow ops in octopus after a faulty nvme ssd was inserted in one ceph server.
as was said in other mails, try to isolate your root cause.

maybe the node added 4 days ago was the culprit here?

we were able to pinpoint the nvme by monitoring the slow osds
and the commonality in this case was the same nvme cache device.

you should always benchmark new hardware/perform burn-in tests imho, which
is not always possible due to environment constraints.

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske
Systementwickler / systems engineer

Mittwald CM Service GmbH & Co. KG
Königsberger Straße 4-6
32339 Espelkamp

Tel.: 05772 / 293-900
Fax: 05772 / 293-333

https://www.mittwald.de

Geschäftsführer: Robert Meyer, Florian Jürgens

St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit 
gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx