Re: bluestore write iops calculation

Paul Emmerich <paul.emmerich@xxxxxxxx> · Fri, 2 Aug 2019 17:30:22 +0200

On Fri, Aug 2, 2019 at 2:51 PM <vitalif@xxxxxxxxxx> wrote:
>
> > 1. For 750 object write request , data written directly into data
> > partition and since we use EC 4+1 there will be 5 iops across the
> > cluster for each obejct write . This makes 750 * 5 = 3750 iops
>
> don't forget about the metadata and the deferring of small writes.
> deferred write queue + metadata, then data for each OSD. this is either
> 2 or 3 ops per an OSD. the deferred write queue is in the same RocksDB
> so deferred write queue + metadata should be 1 op, although a slightly
> bigger one (8-12 kb for 4 kb writes). so it's either 3*5*750 or 2*5*750,
> depending on how your final statistics is collected

where small means 32kb or smaller going to BlueStore, so <= 128kb writes
from the client.

Also: please don't do 4+1 erasure coding, see older discussions for details.

Paul

>
> > 2. For 750 attribute request , first it will be written into
> > rocksdb.WAL and then to rocks.db . So , 2 iops per disk for every
> > attribute request . This makes 750*2*5 = 7500 iops inside the cluster.
>
> rocksdb is LSM so it doesn't write to wal then to DB, it just writes to
> WAL then compacts it at some point and merges with L0->L1->L2->...
>
> so in theory without compaction it should be 1*5*750 iops
>
> however, there is a bug that makes bluestore do 2 writes+syncs instead
> of 1 per each journal write (not all the time though). the first write
> is the rocksdb's WAL and the second one is the bluefs's journal. this
> probably adds another 5*750 iops on top of each of (1) and (2).
>
> so 5*((2 or 3)+1+2)*750 = either 18750 or 22500. 18750/120 = 156.25,
> 22500/120 = 187.5
>
> the rest may be compaction or metadata reads if you update some objects.
> or maybe I'm missing something else. however this is already closer to
> your 200 iops :)
>
> --
> Vitaliy Filippov
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com