Re: Bluestore OSD_DATA, WAL & DB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 16 oktober 2017 om 18:14 schreef Richard Hesketh <richard.hesketh@xxxxxxxxxxxx>:
> 
> 
> On 16/10/17 13:45, Wido den Hollander wrote:
> >> Op 26 september 2017 om 16:39 schreef Mark Nelson <mnelson@xxxxxxxxxx>:
> >> On 09/26/2017 01:10 AM, Dietmar Rieder wrote:
> >>> thanks David,
> >>>
> >>> that's confirming what I was assuming. To bad that there is no
> >>> estimate/method to calculate the db partition size.
> >>
> >> It's possible that we might be able to get ranges for certain kinds of 
> >> scenarios.  Maybe if you do lots of small random writes on RBD, you can 
> >> expect a typical metadata size of X per object.  Or maybe if you do lots 
> >> of large sequential object writes in RGW, it's more like Y.  I think 
> >> it's probably going to be tough to make it accurate for everyone though.
> > 
> > So I did a quick test. I wrote 75.000 objects to a BlueStore device:
> > 
> > root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluestore.bluestore_onodes'
> > 75085
> > root@alpha:~# 
> > 
> > I then saw the RocksDB database was 450MB in size:
> > 
> > root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluefs.db_used_bytes'
> > 459276288
> > root@alpha:~#
> > 
> > 459276288 / 75085 = 6116
> > 
> > So about 6kb of RocksDB data per object.
> > 
> > Let's say I want to store 1M objects in a single OSD I would need ~6GB of DB space.
> > 
> > Is this a safe assumption? Do you think that 6kb is normal? Low? High?
> > 
> > There aren't many of these numbers out there for BlueStore right now so I'm trying to gather some numbers.
> > 
> > Wido
> 
> If I check for the same stats on OSDs in my production cluster I see similar but variable values:
> 
> root@vm-ds-01:~/ceph-conf# for i in {0..9} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done
> osd.0 db per object: 7490
> osd.1 db per object: 7523
> osd.2 db per object: 7378
> osd.3 db per object: 7447
> osd.4 db per object: 7233
> osd.5 db per object: 7393
> osd.6 db per object: 7074
> osd.7 db per object: 7967
> osd.8 db per object: 7253
> osd.9 db per object: 7680
> 
> root@vm-ds-02:~# for i in {10..19} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done
> osd.10 db per object: 5168
> osd.11 db per object: 5291
> osd.12 db per object: 5476
> osd.13 db per object: 4978
> osd.14 db per object: 5252
> osd.15 db per object: 5461
> osd.16 db per object: 5135
> osd.17 db per object: 5126
> osd.18 db per object: 9336
> osd.19 db per object: 4986
> 
> root@vm-ds-03:~# for i in {20..29} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done
> osd.20 db per object: 5115
> osd.21 db per object: 4844
> osd.22 db per object: 5063
> osd.23 db per object: 5486
> osd.24 db per object: 5228
> osd.25 db per object: 4966
> osd.26 db per object: 5047
> osd.27 db per object: 5021
> osd.28 db per object: 5321
> osd.29 db per object: 5150
> 
> root@vm-ds-04:~# for i in {30..39} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done
> osd.30 db per object: 6658
> osd.31 db per object: 6445
> osd.32 db per object: 6259
> osd.33 db per object: 6691
> osd.34 db per object: 6513
> osd.35 db per object: 6628
> osd.36 db per object: 6779
> osd.37 db per object: 6819
> osd.38 db per object: 6677
> osd.39 db per object: 6689
> 
> root@vm-ds-05:~# for i in {40..49} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done
> osd.40 db per object: 5335
> osd.41 db per object: 5203
> osd.42 db per object: 5552
> osd.43 db per object: 5188
> osd.44 db per object: 5218
> osd.45 db per object: 5157
> osd.46 db per object: 4956
> osd.47 db per object: 5370
> osd.48 db per object: 5117
> osd.49 db per object: 5313
> 
> I'm not sure why so much variance (these nodes are basically identical) and I think that the db_used_bytes includes the WAL at least in my case, as I don't have a separate WAL device. I'm not sure how big the WAL is relative to metadata and hence how much this might be thrown off, but ~6kb/object seems like a reasonable value to take for back-of-envelope calculating.
> 

Yes, judging from your numbers 6kb/object seems reasonable. More datapoints are welcome in this case.

Some input from a BlueStore dev might be helpful as well to see we are not drawing the wrong conclusions here.

Wido

> [bonus hilarity]
> On my all-in-one-SSD OSDs, because bluestore reports them entirely as db space, I get results like:
> 
> root@vm-hv-01:~# for i in {60..65} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done
> osd.60 db per object: 80273
> osd.61 db per object: 68859
> osd.62 db per object: 45560
> osd.63 db per object: 38209
> osd.64 db per object: 48258
> osd.65 db per object: 50525
> 
> Rich
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux