On Fri, Aug 23, 2013 at 9:53 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
Okay. It's important to realize that because Ceph distributes data
pseudorandomly, each OSD is going to end up with about the same amount
of data going to it. If one of your drives is slower than the others,
the fast ones can get backed up waiting on the slow one to acknowledge
writes, so they end up impacting the cluster throughput a
disproportionate amount. :(
Anyway, I'm guessing you have 24 OSDs from your math earlier?
47MB/s * 24 / 2 = 564MB/s
41MB/s * 24 / 2 = 492MB/s
33 OSDs and 3 hosts in the cluster.
So taking out or reducing the weight on the slow ones might improve
things a little. But that's still quite a ways off from what you're
seeing — there are a lot of things that could be impacting this but
there's probably something fairly obvious with that much of a gap.
What is the exact benchmark you're running? What do your nodes look like?
The write benchmark I am running is Fio with the following configuration:
ioengine: "libaio"
iodepth: 16
runtime: 180
numjobs: 16
- name: "128k-500M-write"
description: "128K block 500M write"
bs: "128K"
size: "500M"
rw: "write"
Sorry for the weird yaml formatting but I'm copying it from the config file of my automation stuff.
I run that on powers of 2 VMs up to 32. Each VM is qemu-kvm with a 50 GB RBD-backed Cinder volume attached. They are 2 VCPU, 4 GB RAM VMs.
The host machines are Dell C6220s, 16-core, hyperthreaded VMs, 128 GB RAM, with bonded 10 Gbps NICs (mode 4, 20 Gbps throughput -- tested and verified that's working correctly). There are 2 host machines with 16 VMs each.
The Ceph cluster is made up of Dell C6220s, same NIC setup, 256 GB RAM, same CPU, 12 disks each (one for os, 11 for OSDs).
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com