Re: Bluestore DB size and onode count

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 10 Sep 2018 12:26:49 -0500

On 09/10/2018 12:22 PM, Igor Fedotov wrote:

Hi Nick.

On 9/10/2018 1:30 PM, Nick Fisk wrote:
If anybody has 5 minutes could they just clarify a couple of things 
for me

1. onode count, should this be equal to the number of objects stored 
on the OSD?
Through reading several posts, there seems to be a general indication 
that this is the case, but looking at my OSD's the maths don't
work.
onode_count is the number of onodes in the cache, not the total number 
of onodes at an OSD.
Hence the difference...

Eg.
ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL  %USE  VAR  PGS
  0   hdd 2.73679  1.00000 2802G  1347G  1454G 48.09 0.69 115

So 3TB OSD, roughly half full. This is pure RBD workload (no 
snapshots or anything clever) so let's assume worse case scenario of
4MB objects (Compression is on however, which would only mean more 
objects for given size)
1347000/4=~336750 expected objects

sudo ceph daemon osd.0 perf dump | grep blue
     "bluefs": {
     "bluestore": {
         "bluestore_allocated": 1437813964800,
         "bluestore_stored": 2326118994003,
         "bluestore_compressed": 445228558486,
         "bluestore_compressed_allocated": 547649159168,
         "bluestore_compressed_original": 1437773843456,
         "bluestore_onodes": 99022,
         "bluestore_onode_hits": 18151499,
         "bluestore_onode_misses": 4539604,
         "bluestore_onode_shard_hits": 10596780,
         "bluestore_onode_shard_misses": 4632238,
         "bluestore_extents": 896365,
         "bluestore_blobs": 861495,

99022 onodes, anyone care to enlighten me?

2. block.db Size
sudo ceph daemon osd.0 perf dump | grep db
         "db_total_bytes": 8587829248,
         "db_used_bytes": 2375024640,

2.3GB=0.17% of data size. This seems a lot lower than the 1% 
recommendation (10GB for every 1TB) or 4% given in the official docs. I
know that different workloads will have differing overheads and 
potentially smaller objects. But am I understanding these figures
correctly as they seem dramatically lower?
Just in case - is slow_used_bytes equal to 0? Some DB data might 
reside at slow device if spill over has happened. Which doesn't 
require full DB volume to happen - that's by RocksDB's design.

And recommended numbers are a bit... speculative. So it's quite 
possible that you numbers are absolutely adequate.

FWIW, these are the numbers I came up with after examining the SST files 
generated under different workloads:

https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing

Regards,
Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com