Re: Rocksdb compaction and OSD timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


We went from 16.2.13 to 16.2.14

Also, timeout is 15 seconds because it's the default in Ceph. Basically, 15 seconds before Ceph shows a warning that OSD is timing out.

We may have found the solution, but it would be, in fact, related to bluestore_allocator and not the compaction process. I'll post the actual resolution when we confirm 100% that it works.

On 9/7/23 12:18, Konstantin Shalygin wrote:

On 7 Sep 2023, at 18:21, J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> wrote:

Since my post, we've been speaking with a member of the Ceph dev team. He did, at first, believe it was an issue linked to the common performance degradation after huge deletes operation. So we did do offline compactions on all our OSDs. It fixed nothing and we are going through the logs to try and figure this out.

To answer your question, no the OSD doesn't restart after it logs the timeout. It manages to get back online by itself, at the cost of sluggish performances for the cluster and high iowait on VMs.

We mostly run RBD workloads.

Deep scrubs or no deep scrubs doesn't appear to change anything. Deactivating scrubs altogether did not impact performances in any way.

Furthermore, I'll stress that this is only happening since we upgraded to the latest Pacific, yesterday.

What is your previous release version? What is your OSD drives models?
The timeout are always 15s? Not 7s, not 17s?


Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux