Re: Collecting BlueStore per Object DB overhead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Apr 26, 2018 at 11:36 AM Wido den Hollander <wido@xxxxxxxx> wrote:
Hi,

I've been investigating the per object overhead for BlueStore as I've
seen this has become a topic for a lot of people who want to store a lot
of small objects in Ceph using BlueStore.

I've writting a piece of Python code which can be run on a server
running OSDs and will print the overhead.

https://gist.github.com/wido/b1328dd45aae07c45cb8075a24de9f1f

Feedback on this script is welcome, but also the output of what people
are observing.

The results from my tests are below, but what I see is that the overhead
seems to range from 10kB to 30kB per object.

On RBD-only clusters the overhead seems to be around 11kB, but on
clusters with a RGW workload the overhead goes higher to 20kB.

This change seems implausible as RGW always writes full objects, whereas RBD will frequently write pieces of them and do overwrites.
I'm not sure what all knobs are available and which diagnostics BlueStore exports, but is it possible you're looking at the total RocksDB data store rather than the per-object overhead? The distinction here being that the RocksDB instance will also store "client" (ie, RGW) omap data and xattrs, in addition to the actual BlueStore onodes.
-Greg
 

I know that partial overwrites and appends contribute to higher overhead
on objects and I'm trying to investigate this and share my information
with the community.

I have two use-cases who want to store >2 billion objects with a avg
object size of 50kB (8 - 80kB) and the RocksDB overhead is likely to
become a big problem.

Anybody willing to share the overhead they are seeing with what use-case?

The more data we have on this the better we can estimate how DBs need to
be sized for BlueStore deployments.

Wido

# Cluster #1
osd.25 _onodes_=178572 db_used_bytes=2188378112 avg_obj_size=6196529
overhead=12254
osd.20 _onodes_=209871 db_used_bytes=2307915776 avg_obj_size=5452002
overhead=10996
osd.10 _onodes_=195502 db_used_bytes=2395996160 avg_obj_size=6013645
overhead=12255
osd.30 _onodes_=186172 db_used_bytes=2393899008 avg_obj_size=6359453
overhead=12858
osd.1 _onodes_=169911 db_used_bytes=1799356416 avg_obj_size=4890883
overhead=10589
osd.0 _onodes_=199658 db_used_bytes=2028994560 avg_obj_size=4835928
overhead=10162
osd.15 _onodes_=204015 db_used_bytes=2384461824 avg_obj_size=5722715
overhead=11687

# Cluster #2
osd.1 _onodes_=221735 db_used_bytes=2773483520 avg_obj_size=5742992
overhead_per_obj=12508
osd.0 _onodes_=196817 db_used_bytes=2651848704 avg_obj_size=6454248
overhead_per_obj=13473
osd.3 _onodes_=212401 db_used_bytes=2745171968 avg_obj_size=6004150
overhead_per_obj=12924
osd.2 _onodes_=185757 db_used_bytes=3567255552 avg_obj_size=5359974
overhead_per_obj=19203
osd.5 _onodes_=198822 db_used_bytes=3033530368 avg_obj_size=6765679
overhead_per_obj=15257
osd.4 _onodes_=161142 db_used_bytes=2136997888 avg_obj_size=6377323
overhead_per_obj=13261
osd.7 _onodes_=158951 db_used_bytes=1836056576 avg_obj_size=5247527
overhead_per_obj=11551
osd.6 _onodes_=178874 db_used_bytes=2542796800 avg_obj_size=6539688
overhead_per_obj=14215
osd.9 _onodes_=195166 db_used_bytes=2538602496 avg_obj_size=6237672
overhead_per_obj=13007
osd.8 _onodes_=203946 db_used_bytes=3279945728 avg_obj_size=6523555
overhead_per_obj=16082

# Cluster 3
osd.133 _onodes_=68558 db_used_bytes=15868100608 avg_obj_size=14743206
overhead_per_obj=231455
osd.132 _onodes_=60164 db_used_bytes=13911457792 avg_obj_size=14539445
overhead_per_obj=231225
osd.137 _onodes_=62259 db_used_bytes=15597568000 avg_obj_size=15138484
overhead_per_obj=250527
osd.136 _onodes_=70361 db_used_bytes=14540603392 avg_obj_size=13729154
overhead_per_obj=206657
osd.135 _onodes_=68003 db_used_bytes=12285116416 avg_obj_size=12877744
overhead_per_obj=180655
osd.134 _onodes_=64962 db_used_bytes=14056161280 avg_obj_size=15923550
overhead_per_obj=216375
osd.139 _onodes_=68016 db_used_bytes=20782776320 avg_obj_size=13619345
overhead_per_obj=305557
osd.138 _onodes_=66209 db_used_bytes=12850298880 avg_obj_size=14593418
overhead_per_obj=194086
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux