We changed these settings. Our config now is:
bluestore_rocksdb_options = "compression=kSnappyCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=3,recycle_log_file_num=16,compaction_style=kCompactionStyleLevel,write_buffer_size=50331648,target_file_size_base=50331648,max_background_compactions=31,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,num_levels=5,max_bytes_for_level_base=603979776,max_bytes_for_level_multiplier=10,compaction_threads=32,flusher_threads=8"
It could be changed without redeploy. It changes the sst files, when compaction is triggered. The additional improvement is Snappy compression. We rebuild ceph with support for it. I can create PR with it, if you want :)
Best Regards,
Rafał Wądołowski
Cloud & Security Engineer
On 25.06.2019 22:16, Christian Wuerdig wrote:
The sizes are determined by rocksdb settings - some details can be found here: https://tracker.ceph.com/issues/24361One thing to note, in this thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html it's noted that rocksdb could use up to 100% extra space during compaction so if you want to avoid spill over during compaction then safer values would be 6/60/600 GB
You can change max_bytes_for_level_base and max_bytes_for_level_multiplier to suit your needs better but I'm not sure if that can be changed on the fly or if you have to re-create OSDs in order to make them apply
On Tue, 25 Jun 2019 at 18:06, Rafał Wądołowski <rwadolowski@xxxxxxxxxxxxxx> wrote:
_______________________________________________Why are you selected this specific sizes? Are there any tests/research on it?
Best Regards,
Rafał Wądołowski
On 24.06.2019 13:05, Konstantin Shalygin wrote:
Hi Have been thinking a bit about rocksdb and EC pools: Since a RADOS object written to a EC(k+m) pool is split into several minor pieces, then the OSD will receive many more smaller objects, compared to the amount it would receive in a replicated setup. This must mean that the rocksdb will also need to handle this more entries, and will grow faster. This will have an impact when using bluestore for slow HDD with DB on SSD drives, where the faster growing rocksdb might result in spillover to slow store - if not taken into consideration when designing the disk layout. Are my thoughts on the right track or am I missing something? Has somebody done any measurement on rocksdb growth, comparing replica vs EC ?If you want to be not affected on spillover of block.db - use 3/30/300 GB partition for your block.db.
k
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com