Understanding throughput/bandwidth changes in object store

<hrast@xxxxxxxxx> · Tue, 16 Aug 2016 13:00:45 +0000 (UTC)

Env: Ceph 10.2.2, 6 nodes, 96 OSDs, journals on ssd (8 per ssd), OSDs are enterprise SATA disks, 50KB objects, dual 10 Gbe, 3 copies of each object

I'm running some tests with COSbench to our object store, and I'm not really understanding what I'm seeing when changing the number of nodes. With 6 nodes, I'm getting a write bandwidth of about 160MB/s and 3300 Operations per second. I suspect we're IOPS bound, so to verify, we thought we'd take down one node, and see if the performance reduced by 1/6th (or so). However, when I removed the OSDs, mon, and rgw from that node in the cluster (and waited for the rebalance to complete) then turned off the node. I reran the same test I had run before, except my bandwidth and operations per sec were now a 1/3 what they had been a 6 nodes. I'm at a loss to understand why the huge impact at the removal of a single node, does anyone have an explaination?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com