Re: Rocksdb compaction and OSD timeout

J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> · Thu, 7 Sep 2023 05:07:49 -0400

We're talking about automatic online compaction here, not running the 
command.

On 9/7/23 04:04, Konstantin Shalygin wrote:
Hi,

On 7 Sep 2023, at 10:05, J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> wrote:

We're running latest Pacific on our production cluster and we've been 
seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed 
out after 15.000000954s' error. We have reasons to believe this 
happens each time the RocksDB compaction process is launched on an 
OSD. My question is, does the cluster detecting that an OSD has timed 
out interrupt the compaction process? This seems to be what's 
happening, but it's not immediately obvious. We are currently facing 
an infinite loop of random OSDs timing out and if the compaction 
process is interrupted without finishing, it may explain that.

You run the online compacting for this OSD's (`ceph osd compact 
${osd_id}` command), right?

k

--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx