Re: How to minimise the impact of compaction in ‘rocksdb options’?

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Tue, 16 Nov 2021 08:27:57 +0000

Hi Mark,

I've picked some which reported failed today:
This is the beginning of the log parser, how does it look like?

Compaction Statistics   /var/log/ceph/ceph-osd.26.log
Total OSD Log Duration (seconds)        41766.799
Number of Compaction Events     90
Avg Compaction Time (seconds)   3.934152933333334
Total Compaction Time (seconds) 354.07376400000004
Avg Output Size: (MB)   557.684199587504
Total Output Size: (MB) 50191.577962875366
Total Input Records     447697809
Total Output Records    430832247
Avg Output Throughput (MB/s)    145.05270280158922
Avg Input Records/second        1433801.859746764
Avg Output Records/second       1211797.1846839893
Avg Output/Input Ratio  0.9493287032959218

And another one:
Compaction Statistics   /var/log/ceph/ceph-osd.19.log
Total OSD Log Duration (seconds)        42004.52
Number of Compaction Events     23
Avg Compaction Time (seconds)   4.211967217391304
Total Compaction Time (seconds) 96.87524599999999
Avg Output Size: (MB)   590.1112143060435
Total Output Size: (MB) 13572.557929039001
Total Input Records     68160780
Total Output Records    66981318
Avg Output Throughput (MB/s)    123.39584478881285
Avg Input Records/second        952025.1911276178
Avg Output Records/second       910383.7987128447
Avg Output/Input Ratio  0.9612154436271764

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> 
Sent: Tuesday, November 16, 2021 1:33 AM
To: Mark Nelson <mnelson@xxxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject:  Re: How to minimise the impact of compaction in ‘rocksdb options’?

I’ll give a try to the script.

Yes, I have only rgw with data on ec 4:2 and have a huge amounts of small objects.

Have a bucket with 1.2 billions of objects and have another with 300 millions of objects and other buckets.

The slow ops most of the time is spending on “waiting for readable” pg sometimes even minutes.

I had all my osd spilledover when I used nvme device in front of the sas ssd as a wal+db, but I removed (migrated) back to the block to avoid spillovers.

So what would be the best practice/solution to handle it?

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------

On 2021. Nov 15., at 17:58, Mark Nelson <mnelson@xxxxxxxxxx> wrote:

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Hi,

Compaction can block reads, but on the write path you should be able to absorb a certain amount of writes via the WAL before rocksdb starts throttling writes.  The larger and more WAL buffers you have, the more writes you can absorb, but bigger buffers also take more CPU to keep in sorted order and more aggregate buffer uses more RAM so it's a double edged sword.  I'd suggest looking and seeing how much time you actually spend in compaction.  For clusters that primarily are serving block via RBD, there's a good chance it's actually fairly minimal.  For RGW (especially with lots of small objects and/or using erasure coding) you might be spending more time in compaction, but it's important to see how much.

FWIW, you can try running the following script against your OSD log to see a summary of compaction events:

https://github.com/ceph/cbt/blob/master/tools/ceph_rocksdb_log_parser.py

Mark

On 11/15/21 10:48 AM, Szabo, Istvan (Agoda) wrote:
Hello,

If I’m not mistaken in my cluster this can block io on the osds if have a huge amount of objects on that specific osd.

How can I change the values to minimise the impact?

I guess it needs osd restart to make it effective and the “rocksdb options” are the values that needs to be tuned, but what should it be changed?

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx