Hello Here my results In this node, I have 3 OSDs (1TB HDD), osd.1 and osd.2 have blocks.db in SSD partitions each of 90GB, osd.8 has no separate blocks.db pve-hs-main[0]:~$ for i in {1,2,8} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.1 db per object: 20872 osd.2 db per object: 20416 osd.8 db per object: 16888
In this node, I have 3 OSDs (1TB HDD), each with a 60GB blocks.db on a separate SSD pve-hs-2[0]:/$ for i in {3..5} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.3 db per object: 19053 osd.4 db per object: 18742 osd.5 db per object: 14979
In this node I have 3 OSDs (1TB HDD) with no separate SSD pve-hs-3[0]:~$ for i in {0,6,7} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.0 db per object: 27392 osd.6 db per object: 54065 osd.7 db per object: 69986 My ceph df and rados df, if they can be useful pve-hs-3[0]:~$ ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 8742G 6628G 2114G 24.19 187k POOLS: NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED cephbackup 9 N/A N/A 469G 7.38 2945G 120794 117k 759k 2899k 938G cephwin 13 N/A N/A 73788M 1.21 1963G 18711 18711 1337k 1637k 216G cephnix 14 N/A N/A 201G 3.31 1963G 52407 52407 791k 1781k 605G pve-hs-3[0]:~$ rados df detail POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR cephbackup 469G 120794 0 241588 0 0 0 777872 7286M 2968926 718G cephnix 201G 52407 0 157221 0 0 0 810317 67057M 1824184 242G cephwin 73788M 18711 0 56133 0 0 0 1369792 155G 1677060 136G total_objects 191912 total_used 2114G total_avail 6628G total_space 8742G Can someone see a pattern?
Il 17/10/2017 08:54, Wido den Hollander
ha scritto:
Op 16 oktober 2017 om 18:14 schreef Richard Hesketh <richard.hesketh@xxxxxxxxxxxx>: On 16/10/17 13:45, Wido den Hollander wrote:Op 26 september 2017 om 16:39 schreef Mark Nelson <mnelson@xxxxxxxxxx>: On 09/26/2017 01:10 AM, Dietmar Rieder wrote:thanks David, that's confirming what I was assuming. To bad that there is no estimate/method to calculate the db partition size.It's possible that we might be able to get ranges for certain kinds of scenarios. Maybe if you do lots of small random writes on RBD, you can expect a typical metadata size of X per object. Or maybe if you do lots of large sequential object writes in RGW, it's more like Y. I think it's probably going to be tough to make it accurate for everyone though.So I did a quick test. I wrote 75.000 objects to a BlueStore device: root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluestore.bluestore_onodes' 75085 root@alpha:~# I then saw the RocksDB database was 450MB in size: root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluefs.db_used_bytes' 459276288 root@alpha:~# 459276288 / 75085 = 6116 So about 6kb of RocksDB data per object. Let's say I want to store 1M objects in a single OSD I would need ~6GB of DB space. Is this a safe assumption? Do you think that 6kb is normal? Low? High? There aren't many of these numbers out there for BlueStore right now so I'm trying to gather some numbers. WidoIf I check for the same stats on OSDs in my production cluster I see similar but variable values: root@vm-ds-01:~/ceph-conf# for i in {0..9} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.0 db per object: 7490 osd.1 db per object: 7523 osd.2 db per object: 7378 osd.3 db per object: 7447 osd.4 db per object: 7233 osd.5 db per object: 7393 osd.6 db per object: 7074 osd.7 db per object: 7967 osd.8 db per object: 7253 osd.9 db per object: 7680 root@vm-ds-02:~# for i in {10..19} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.10 db per object: 5168 osd.11 db per object: 5291 osd.12 db per object: 5476 osd.13 db per object: 4978 osd.14 db per object: 5252 osd.15 db per object: 5461 osd.16 db per object: 5135 osd.17 db per object: 5126 osd.18 db per object: 9336 osd.19 db per object: 4986 root@vm-ds-03:~# for i in {20..29} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.20 db per object: 5115 osd.21 db per object: 4844 osd.22 db per object: 5063 osd.23 db per object: 5486 osd.24 db per object: 5228 osd.25 db per object: 4966 osd.26 db per object: 5047 osd.27 db per object: 5021 osd.28 db per object: 5321 osd.29 db per object: 5150 root@vm-ds-04:~# for i in {30..39} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.30 db per object: 6658 osd.31 db per object: 6445 osd.32 db per object: 6259 osd.33 db per object: 6691 osd.34 db per object: 6513 osd.35 db per object: 6628 osd.36 db per object: 6779 osd.37 db per object: 6819 osd.38 db per object: 6677 osd.39 db per object: 6689 root@vm-ds-05:~# for i in {40..49} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.40 db per object: 5335 osd.41 db per object: 5203 osd.42 db per object: 5552 osd.43 db per object: 5188 osd.44 db per object: 5218 osd.45 db per object: 5157 osd.46 db per object: 4956 osd.47 db per object: 5370 osd.48 db per object: 5117 osd.49 db per object: 5313 I'm not sure why so much variance (these nodes are basically identical) and I think that the db_used_bytes includes the WAL at least in my case, as I don't have a separate WAL device. I'm not sure how big the WAL is relative to metadata and hence how much this might be thrown off, but ~6kb/object seems like a reasonable value to take for back-of-envelope calculating.Yes, judging from your numbers 6kb/object seems reasonable. More datapoints are welcome in this case. Some input from a BlueStore dev might be helpful as well to see we are not drawing the wrong conclusions here. Wido[bonus hilarity] On my all-in-one-SSD OSDs, because bluestore reports them entirely as db space, I get results like: root@vm-hv-01:~# for i in {60..65} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done osd.60 db per object: 80273 osd.61 db per object: 68859 osd.62 db per object: 45560 osd.63 db per object: 38209 osd.64 db per object: 48258 osd.65 db per object: 50525 Rich_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com