block.db is very unlikely to ever grow to 250GB with a 6TB data device.
However, there seems to be a funny "issue" with all block.db sizes
except 4, 30, and 286 GB being useless, because RocksDB puts the data on
the fast storage only if it thinks the whole LSM level will fit there.
Ceph's RocksDB options set WAL to 1GB and leave the default
max_bytes_for_level_base unchanged so it's 256MB. Multiplier is also
left at 10. So WAL=1GB, L1=256MB, L2=2560MB, L3=25600MB. So RocksDB will
put L2 to the block.db only if the block.db's size exceeds
1GB+256MB+2560MB (which rounds up to 4GB), and it will put L3 to the
block.db only if its size exceeds 1GB+256MB+2560MB+25600MB = almost
30GB.
Hello,
i was wondering about ceph block.db to be nearly empty and I started
to investigate.
The recommendations from ceph are that block.db should be at least
4% the size of block. So my OSD configuration looks like this:
wal.db - not explicit specified
block.db - 250GB of SSD storage
block - 6TB
Since wal is written to block.db if not available i didn't configured
wal. With the size of 250GB we are slightly above 4%.
So everything should be "fine". But the block.db only contains
about 10GB of data.
If figured out that an object in block.db gets "amplified" so
the space consumption is much higher than the object itself
would need.
I'm using ceph as storage backend for openstack and raw images
with a size of 10GB and more are common. So if i understand
this correct i have to consider that a 10GB images may
consume 100GB of block.db.
Beside the facts that the image may have a size of 100G and
they are only used for initial reads unitl all changed
blocks gets written to a SSD-only pool i was question me
if i need a block.db and if it would be better to
save the amount of SSD space used for block.db and just
create a 10GB wal.db?
Has anyone done this before? Anyone who had sufficient SSD space
but stick with wal.db to save SSD space?
If i'm correct the block.db will never be used for huge images.
And even though it may be used for one or two images does this make
sense? The images are used initially to read all unchanged blocks from
it. After a while each VM should access the images pool less and
less due to the changes made in the VM.
Any thoughts about this?
Best regards
--
Benjamin Zapiec <benjamin.zapiec@xxxxxxxxxx> (System Engineer)
* GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
* Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
* http://www.GONICUS.de
* Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
* Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
* Vorsitzender des Beirats: Juergen Michels
* Amtsgericht Arnsberg * HRB 1968
Wir erfüllen unsere Informationspflichten zum Datenschutz gem. der
Artikel 13
und 14 DS-GVO durch Veröffentlichung auf unserer Internetseite unter:
https://www.gonicus.de/datenschutz oder durch Zusendung auf Ihre
formlose Anfrage.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com