Re: Thoughts on rocksdb and erasurecode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hm, according to https://tracker.ceph.com/issues/24025 snappy compression should be available out of the box at least since luminous. What ceph version are you running?

On Wed, 26 Jun 2019 at 21:51, Rafał Wądołowski <rwadolowski@xxxxxxxxxxxxxx> wrote:

We changed these settings. Our config now is:

bluestore_rocksdb_options = "compression=kSnappyCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=3,recycle_log_file_num=16,compaction_style=kCompactionStyleLevel,write_buffer_size=50331648,target_file_size_base=50331648,max_background_compactions=31,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,num_levels=5,max_bytes_for_level_base=603979776,max_bytes_for_level_multiplier=10,compaction_threads=32,flusher_threads=8"

It could be changed without redeploy. It changes the sst files, when compaction is triggered. The additional improvement is Snappy compression. We rebuild ceph with support for it. I can create PR with it, if you want :)


Best Regards,

Rafał Wądołowski
Cloud & Security Engineer

On 25.06.2019 22:16, Christian Wuerdig wrote:
The sizes are determined by rocksdb settings - some details can be found here: https://tracker.ceph.com/issues/24361
One thing to note, in this thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html it's noted that rocksdb could use up to 100% extra space during compaction so if you want to avoid spill over during compaction then safer values would be 6/60/600 GB

You can change max_bytes_for_level_base and max_bytes_for_level_multiplier to suit your needs better but I'm not sure if that can be changed on the fly or if you have to re-create OSDs in order to make them apply

On Tue, 25 Jun 2019 at 18:06, Rafał Wądołowski <rwadolowski@xxxxxxxxxxxxxx> wrote:

Why are you selected this specific sizes? Are there any tests/research on it?


Best Regards,

Rafał Wądołowski

On 24.06.2019 13:05, Konstantin Shalygin wrote:

Hi

Have been thinking a bit about rocksdb and EC pools:

Since a RADOS object written to a EC(k+m) pool is split into several 
minor pieces, then the OSD will receive many more smaller objects, 
compared to the amount it would receive in a replicated setup.

This must mean that the rocksdb will also need to handle this more 
entries, and will grow faster. This will have an impact when using 
bluestore for slow HDD with DB on SSD drives, where the faster growing 
rocksdb might result in spillover to slow store - if not taken into 
consideration when designing the disk layout.

Are my thoughts on the right track or am I missing something?

Has somebody done any measurement on rocksdb growth, comparing replica 
vs EC ?

If you want to be not affected on spillover of block.db - use 3/30/300 GB partition for your block.db.



k


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux