Re: Rocksdb compaction and OSD timeout

J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> · Thu, 7 Sep 2023 11:21:02 -0400

Hi,

Since my post, we've been speaking with a member of the Ceph dev team. 
He did, at first, believe it was an issue linked to the common 
performance degradation after huge deletes operation. So we did do 
offline compactions on all our OSDs. It fixed nothing and we are going 
through the logs to try and figure this out.

To answer your question, no the OSD doesn't restart after it logs the 
timeout. It manages to get back online by itself, at the cost of 
sluggish performances for the cluster and high iowait on VMs.

We mostly run RBD workloads.

Deep scrubs or no deep scrubs doesn't appear to change anything. 
Deactivating scrubs altogether did not impact performances in any way.

Furthermore, I'll stress that this is only happening since we upgraded 
to the latest Pacific, yesterday.

On 9/7/23 10:49, Stefan Kooman wrote:
On 07-09-2023 09:05, J-P Methot wrote:
Hi,

We're running latest Pacific on our production cluster and we've been 
seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed 
out after 15.000000954s' error. We have reasons to believe this 
happens each time the RocksDB compaction process is launched on an 
OSD. My question is, does the cluster detecting that an OSD has timed 
out interrupt the compaction process? This seems to be what's 
happening, but it's not immediately obvious. We are currently facing 
an infinite loop of random OSDs timing out and if the compaction 
process is interrupted without finishing, it may explain that.

Does the OSD also restart after it logged the timeouts?

You might want to perform an offline compaction every $timeperiod to 
fix any potential RocksDB degradation. That's what we do. What kind of 
workload do you run (i.e. RBD, CephFS, RGW)?

Do you also see these timeouts occur during deep-scrubs?

Gr. Stefan

--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx