Re: Rocksdb compaction and OSD timeout

Konstantin Shalygin <k0ste@xxxxxxxx> · Thu, 7 Sep 2023 19:18:27 +0300

Hi,

> On 7 Sep 2023, at 18:21, J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> wrote:
> 
> Since my post, we've been speaking with a member of the Ceph dev team. He did, at first, believe it was an issue linked to the common performance degradation after huge deletes operation. So we did do offline compactions on all our OSDs. It fixed nothing and we are going through the logs to try and figure this out.
> 
> To answer your question, no the OSD doesn't restart after it logs the timeout. It manages to get back online by itself, at the cost of sluggish performances for the cluster and high iowait on VMs.
> 
> We mostly run RBD workloads.
> 
> Deep scrubs or no deep scrubs doesn't appear to change anything. Deactivating scrubs altogether did not impact performances in any way.
> 
> Furthermore, I'll stress that this is only happening since we upgraded to the latest Pacific, yesterday.

What is your previous release version? What is your OSD drives models?
The timeout are always 15s? Not 7s, not 17s?

Thanks,
k
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx