Re: bluestore write iops calculation

vitalif@xxxxxxxxxx · Fri, 02 Aug 2019 15:50:59 +0300

1. For 750 object write request , data written directly into data
partition and since we use EC 4+1 there will be 5 iops across the
cluster for each obejct write . This makes 750 * 5 = 3750 iops

don't forget about the metadata and the deferring of small writes. 
deferred write queue + metadata, then data for each OSD. this is either 
2 or 3 ops per an OSD. the deferred write queue is in the same RocksDB 
so deferred write queue + metadata should be 1 op, although a slightly 
bigger one (8-12 kb for 4 kb writes). so it's either 3*5*750 or 2*5*750, 
depending on how your final statistics is collected

2. For 750 attribute request , first it will be written into
rocksdb.WAL and then to rocks.db . So , 2 iops per disk for every
attribute request . This makes 750*2*5 = 7500 iops inside the cluster.

rocksdb is LSM so it doesn't write to wal then to DB, it just writes to 
WAL then compacts it at some point and merges with L0->L1->L2->...

so in theory without compaction it should be 1*5*750 iops

however, there is a bug that makes bluestore do 2 writes+syncs instead 
of 1 per each journal write (not all the time though). the first write 
is the rocksdb's WAL and the second one is the bluefs's journal. this 
probably adds another 5*750 iops on top of each of (1) and (2).

so 5*((2 or 3)+1+2)*750 = either 18750 or 22500. 18750/120 = 156.25, 
22500/120 = 187.5

the rest may be compaction or metadata reads if you update some objects. 
or maybe I'm missing something else. however this is already closer to 
your 200 iops :)

--
Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com