Re: RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

Igor Fedotov <ifedotov@xxxxxxx> · Thu, 8 Jul 2021 15:17:02 +0300

Hi Christian,

yeah, came to the same idea to trigger compaction on upgrade 
completetion yesterday.

See https://github.com/ceph/ceph/pull/42218

Thanks,

Igor

On 7/8/2021 10:21 AM, Christian Rohmann wrote:
Hey Igor,

On 07/07/2021 14:59, Igor Fedotov wrote:
after an upgrade from Ceph Nautilus to Octopus we ran into extreme 
performance issues leading to an unusable cluster
when doing a larger snapshot delete and the cluster doing snaptrims, 
see i.e. https://tracker.ceph.com/issues/50511#note-13.
Since this was not an issue prior to the upgrade, maybe the 
conversion of the OSD to OMAP caused this degradation of the RocksDB 
data structures, maybe not. (We were running 
bluefs_buffered_io=true, so that was NOT the issue here).

It's hard to say what exactly caused the issue this time. Indeed OMAP 
conversion could have some impact since it had performed bulk removal 
along the upgrade process - so DB could gain critical mass to start 
lagging.

But I presume this is a one-time effect - it should vaporize after DB 
compaction. Which doesn't mean that snaptrims or any other bulk 
removals are absolutely safe since then though. 

Thank you very much for your quick and extensive reply!

If OMAP conversion could have this effect, maybe it's sensible to 
trigger either an an immediate online compaction to the end of the 
conversion or at least add this to the upgrade notes. I suppose with 
the EoL of Nautilus more and more clusters will now make the jump to 
the Octopus release and convert their OSDs to OMAP in the process. 
Even if not all clusters RocksDBs would go over the edge, in any case 
running a compaction should not hurt right?

Thanks again,

Christian

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx