Re: Understanding throughput/bandwidth changes in object store

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 22 Aug 2016 11:24:28 -0700



On Tue, Aug 16, 2016 at 6:00 AM,  <hrast@xxxxxxxxx> wrote:
> Env: Ceph 10.2.2, 6 nodes, 96 OSDs, journals on ssd (8 per ssd), OSDs are enterprise SATA disks, 50KB objects, dual 10 Gbe, 3 copies of each object
>
> I'm running some tests with COSbench to our object store, and I'm not really understanding what I'm seeing when changing the number of nodes. With 6 nodes, I'm getting a write bandwidth of about 160MB/s and 3300 Operations per second. I suspect we're IOPS bound, so to verify, we thought we'd take down one node, and see if the performance reduced by 1/6th (or so). However, when I removed the OSDs, mon, and rgw from that node in the cluster (and waited for the rebalance to complete) then turned off the node. I reran the same test I had run before, except my bandwidth and operations per sec were now a 1/3 what they had been a 6 nodes. I'm at a loss to understand why the huge impact at the removal of a single node, does anyone have an explaination?

That's pretty odd. Can you describe your test and Ceph config in a
little more detail?
eg: it sounds like you have a monitor on each node? Did bandwidth or
IOPS decrease, and did you test them independently? What did your
rebalance end up looking like and how did you check it was complete?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com