Re: bluestore write iops calculation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi  Team,
@vitalif@xxxxxxxxxx , thank you for information and could you please clarify on the below quires as well,

1. Average object size we use will be 256KB to 512KB , will there be deferred write queue ?
2. Share the link of existing rocksdb ticket which does 2 write + syncs. 
3. Any configuration by which we can reduce/optimize the iops ?

Thanks,
Muthu


On Fri, Aug 2, 2019 at 6:21 PM <vitalif@xxxxxxxxxx> wrote:
> 1. For 750 object write request , data written directly into data
> partition and since we use EC 4+1 there will be 5 iops across the
> cluster for each obejct write . This makes 750 * 5 = 3750 iops

don't forget about the metadata and the deferring of small writes.
deferred write queue + metadata, then data for each OSD. this is either
2 or 3 ops per an OSD. the deferred write queue is in the same RocksDB
so deferred write queue + metadata should be 1 op, although a slightly
bigger one (8-12 kb for 4 kb writes). so it's either 3*5*750 or 2*5*750,
depending on how your final statistics is collected

> 2. For 750 attribute request , first it will be written into
> rocksdb.WAL and then to rocks.db . So , 2 iops per disk for every
> attribute request . This makes 750*2*5 = 7500 iops inside the cluster.

rocksdb is LSM so it doesn't write to wal then to DB, it just writes to
WAL then compacts it at some point and merges with L0->L1->L2->...

so in theory without compaction it should be 1*5*750 iops

however, there is a bug that makes bluestore do 2 writes+syncs instead
of 1 per each journal write (not all the time though). the first write
is the rocksdb's WAL and the second one is the bluefs's journal. this
probably adds another 5*750 iops on top of each of (1) and (2).

so 5*((2 or 3)+1+2)*750 = either 18750 or 22500. 18750/120 = 156.25,
22500/120 = 187.5

the rest may be compaction or metadata reads if you update some objects.
or maybe I'm missing something else. however this is already closer to
your 200 iops :)

--
Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux