On 16/10/17 13:45, Wido den Hollander wrote: >> Op 26 september 2017 om 16:39 schreef Mark Nelson <mnelson@xxxxxxxxxx>: >> On 09/26/2017 01:10 AM, Dietmar Rieder wrote: >>> thanks David, >>> >>> that's confirming what I was assuming. To bad that there is no >>> estimate/method to calculate the db partition size. >> >> It's possible that we might be able to get ranges for certain kinds of >> scenarios. Maybe if you do lots of small random writes on RBD, you can >> expect a typical metadata size of X per object. Or maybe if you do lots >> of large sequential object writes in RGW, it's more like Y. I think >> it's probably going to be tough to make it accurate for everyone though. > > So I did a quick test. I wrote 75.000 objects to a BlueStore device: > > root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluestore.bluestore_onodes' > 75085 > root@alpha:~# > > I then saw the RocksDB database was 450MB in size: > > root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluefs.db_used_bytes' > 459276288 > root@alpha:~# > > 459276288 / 75085 = 6116 > > So about 6kb of RocksDB data per object. > > Let's say I want to store 1M objects in a single OSD I would need ~6GB of DB space. > > Is this a safe assumption? Do you think that 6kb is normal? Low? High? > > There aren't many of these numbers out there for BlueStore right now so I'm trying to gather some numbers. > > Wido If I check for the same stats on OSDs in my production cluster I see similar but variable values: root@vm-ds-01:~/ceph-conf# for i in {0..9} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.0 db per object: 7490 osd.1 db per object: 7523 osd.2 db per object: 7378 osd.3 db per object: 7447 osd.4 db per object: 7233 osd.5 db per object: 7393 osd.6 db per object: 7074 osd.7 db per object: 7967 osd.8 db per object: 7253 osd.9 db per object: 7680 root@vm-ds-02:~# for i in {10..19} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.10 db per object: 5168 osd.11 db per object: 5291 osd.12 db per object: 5476 osd.13 db per object: 4978 osd.14 db per object: 5252 osd.15 db per object: 5461 osd.16 db per object: 5135 osd.17 db per object: 5126 osd.18 db per object: 9336 osd.19 db per object: 4986 root@vm-ds-03:~# for i in {20..29} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.20 db per object: 5115 osd.21 db per object: 4844 osd.22 db per object: 5063 osd.23 db per object: 5486 osd.24 db per object: 5228 osd.25 db per object: 4966 osd.26 db per object: 5047 osd.27 db per object: 5021 osd.28 db per object: 5321 osd.29 db per object: 5150 root@vm-ds-04:~# for i in {30..39} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.30 db per object: 6658 osd.31 db per object: 6445 osd.32 db per object: 6259 osd.33 db per object: 6691 osd.34 db per object: 6513 osd.35 db per object: 6628 osd.36 db per object: 6779 osd.37 db per object: 6819 osd.38 db per object: 6677 osd.39 db per object: 6689 root@vm-ds-05:~# for i in {40..49} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.40 db per object: 5335 osd.41 db per object: 5203 osd.42 db per object: 5552 osd.43 db per object: 5188 osd.44 db per object: 5218 osd.45 db per object: 5157 osd.46 db per object: 4956 osd.47 db per object: 5370 osd.48 db per object: 5117 osd.49 db per object: 5313 I'm not sure why so much variance (these nodes are basically identical) and I think that the db_used_bytes includes the WAL at least in my case, as I don't have a separate WAL device. I'm not sure how big the WAL is relative to metadata and hence how much this might be thrown off, but ~6kb/object seems like a reasonable value to take for back-of-envelope calculating. [bonus hilarity] On my all-in-one-SSD OSDs, because bluestore reports them entirely as db space, I get results like: root@vm-hv-01:~# for i in {60..65} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.60 db per object: 80273 osd.61 db per object: 68859 osd.62 db per object: 45560 osd.63 db per object: 38209 osd.64 db per object: 48258 osd.65 db per object: 50525 Rich
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com