Re: bluestore write iops calculation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 02/08/2019 08:54, nokia ceph wrote:
Hi Team,

Could you please help us in understanding the write iops inside ceph cluster . There seems to be mismatch in iops between theoretical and what we see in disk status. 

Our platform 5 node cluster 120 OSDs, with each node having 24 disks HDD ( data, rcoksdb and rocksdb.WAL all resides in the same disk) .

We use EC 4+1

We do only write operation total average 1500 write iops (750objects/s and 750 attribute requests per second , single Key value entry for each object). And in the ceph status we see consistent 1500 write iops from the client.

Please correct if our assumptions are wrong.
1. For 750 object write request , data written directly into data partition and since we use EC 4+1 there will be 5 iops across the cluster for each obejct write . This makes 750 * 5 = 3750 iops
2. For 750 attribute request , first it will be written into rocksdb.WAL and then to rocks.db . So , 2 iops per disk for every attribute request . This makes 750*2*5 = 7500 iops inside the cluster.

Now the total iops inside the cluster would be 11250 iops. we have 120 OSDs , hence per OSD should have 11250/120 = ~94iops .

Currently we see average 200iops per osd for the same load in iostat however the theoretical calculation seems to be only 94iops .

Could you please let us know where we miss the remaining iops inside the cluster for 1500 write iops from client?

Does each object write will endup in writing one metadata inside rocksdb , then we need to add another 3750 to the total iops  and this make each OSD will have 125iops , still there is difference of 75iops per OSD.

Thanks,
Muthu

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Also is your iostat reading write iops or total read+write iops (iostat tps), note there could be a metada read op at the start of the first write op if not cached in memory.

/Maged


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux