Re: Rocksdb compaction and OSD timeout

Igor Fedotov <igor.fedotov@xxxxxxxx> · Tue, 12 Sep 2023 15:26:45 +0300

HI All,

as promised here is a postmortem analysis on what happened.

the following ticket (https://tracker.ceph.com/issues/62815) with 
accompanying materials provide a low-level overview on the issue.

In a few words it is as follows:

Default hybrid allocator (as well as AVL one it's based on) could take 
dramatically long time to allocate pretty large (hundreds of MBs) 
64K-aligned chunks for BlueFS. At the original cluster it was exposed as 
20-30 sec OSD stalls.

This is apparently not specific to the recent 16.2.14 Pacific release as 
I saw that at least once before but 
https://github.com/ceph/ceph/pull/51773 made it more likely to pop up. 
RocksDB could preallocate huge WALs in a single short from now on .

The issue is definitely bound to aged/fragmented main OSD volumes which 
colocate DB ones. I don't expect it to pop up for standalone DB/WALs.

As already mentioned in this thread the proposed work-around is to 
switch bluestore_allocator to bitmap. This might cause minor overall 
performance drop so I'm not sure one should apply this unconditionally.

I'd like to ask for apologies for the inconvenience this could result. 
We're currently working on a proper fix...

Thanks,

Igor

On 07/09/2023 10:05, J-P Methot wrote:
Hi,

We're running latest Pacific on our production cluster and we've been 
seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed 
out after 15.000000954s' error. We have reasons to believe this 
happens each time the RocksDB compaction process is launched on an 
OSD. My question is, does the cluster detecting that an OSD has timed 
out interrupt the compaction process? This seems to be what's 
happening, but it's not immediately obvious. We are currently facing 
an infinite loop of random OSDs timing out and if the compaction 
process is interrupted without finishing, it may explain that.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx