5000 is actually really good, if you ask me. Assuming everything is connected via gigabit. If you get 40k iops locally, you add the latency of tcp, as well as that of the ceph services and VM layer, and that's what you get. On my network I get about a .1ms round trip on gigabit over the same switch, which by definition can only do 10,000 iops. Then if you have storage on the other end capable of 40k iops, you add the latencies together (.1ms + .025ms) and you're at 8k iops. Then add the small latency of the application servicing the io (NFS, Ceph, etc), and the latency introduced by your VM layer, and 5k sounds about right. The good news is that you probably aren't taxing the storage, you can likely do many simultaneous tests from several VMs and get the same results. You can try adding --numjobs to your fio to parallelize the specific test you're doing, or launching a second VM and doing the same test at the same time. This would be a good indicator if it's latency. On Wed, Oct 31, 2012 at 10:29 AM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote: >>>Have you tried increasing the iodepth? > Yes, I have try with 100 and 200, same results. > > I have also try directly from the host, with /dev/rbd1, and I have same result. > I have also try with 3 differents hosts, with differents cpus models. > > (note: I can reach around 40.000 iops with same fio config on a zfs iscsi array) > > My test ceph cluster nodes cpus are old (xeon E5420), but they are around 10% usage, so I think it's ok. > > > Do you have an idea if I can trace something ? > > Thanks, > > Alexandre > > ----- Mail original ----- > > De: "Sage Weil" <sage@xxxxxxxxxxx> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Mercredi 31 Octobre 2012 16:57:05 > Objet: Re: slow fio random read benchmark, need help > > On Wed, 31 Oct 2012, Alexandre DERUMIER wrote: >> Hello, >> >> I'm doing some tests with fio from a qemu 1.2 guest (virtio disk,cache=none), randread, with 4K block size on a small size of 1G (so it can be handle by the buffer cache on ceph cluster) >> >> >> fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M --iodepth=40 --group_reporting --name=file1 --ioengine=libaio --direct=1 >> >> >> I can't get more than 5000 iops. > > Have you tried increasing the iodepth? > > sage > >> >> >> RBD cluster is : >> --------------- >> 3 nodes,with each node : >> -6 x osd 15k drives (xfs), journal on tmpfs, 1 mon >> -cpu: 2x 4 cores intel xeon E5420@2.5GHZ >> rbd 0.53 >> >> ceph.conf >> >> journal dio = false >> filestore fiemap = false >> filestore flusher = false >> osd op threads = 24 >> osd disk threads = 24 >> filestore op threads = 6 >> >> kvm host is : 4 x 12 cores opteron >> ------------ >> >> >> During the bench: >> >> on ceph nodes: >> - cpu is around 10% used >> - iostat show no disks activity on osds. (so I think that the 1G file is handle in the linux buffer) >> >> >> on kvm host: >> >> -cpu is around 20% used >> >> >> I really don't see where is the bottleneck.... >> >> Any Ideas, hints ? >> >> >> Regards, >> >> Alexandre >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html