Hi,
We're running latest Pacific on our production cluster and we've been
seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out
after 15.000000954s' error. We have reasons to believe this happens each
time the RocksDB compaction process is launched on an OSD. My question
is, does the cluster detecting that an OSD has timed out interrupt the
compaction process? This seems to be what's happening, but it's not
immediately obvious. We are currently facing an infinite loop of random
OSDs timing out and if the compaction process is interrupted without
finishing, it may explain that.
--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx