I'm not sure that latency addition is quite correct. Most use cases cases do multiple IOs at the same time, and good benchmarks tend to reflect that. I suspect the IO limitations here are a result of QEMU's storage handling (or possibly our client layer) more than anything else — Josh can talk about that more than I can, though! -Greg On Thu, Nov 1, 2012 at 8:38 AM, Dietmar Maurer <dietmar@xxxxxxxxxxx> wrote: > I do not really understand that network latency argument. > > If one can get 40K iops with iSCSI, why can't I get the same with rados/ceph? > > Note: network latency is the same in both cases > > What do I miss? > >> -----Original Message----- >> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- >> owner@xxxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER >> Sent: Mittwoch, 31. Oktober 2012 18:27 >> To: Marcus Sorensen >> Cc: Sage Weil; ceph-devel >> Subject: Re: slow fio random read benchmark, need help >> >> Thanks Marcus, >> >> indeed gigabit ethernet. >> >> note that my iscsi results (40k)was with multipath, so multiple gigabit links. >> >> I have also done tests with a netapp array, with nfs, single link, I'm around >> 13000 iops >> >> I will do more tests with multiples vms, from differents hosts, and with -- >> numjobs. >> >> I'll keep you in touch, >> >> Thanks for help, >> >> Regards, >> >> Alexandre >> >> >> ----- Mail original ----- >> >> De: "Marcus Sorensen" <shadowsor@xxxxxxxxx> >> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> >> Cc: "Sage Weil" <sage@xxxxxxxxxxx>, "ceph-devel" <ceph- >> devel@xxxxxxxxxxxxxxx> >> Envoyé: Mercredi 31 Octobre 2012 18:08:11 >> Objet: Re: slow fio random read benchmark, need help >> >> 5000 is actually really good, if you ask me. Assuming everything is connected >> via gigabit. If you get 40k iops locally, you add the latency of tcp, as well as >> that of the ceph services and VM layer, and that's what you get. On my >> network I get about a .1ms round trip on gigabit over the same switch, which >> by definition can only do 10,000 iops. Then if you have storage on the other >> end capable of 40k iops, you add the latencies together (.1ms + .025ms) and >> you're at 8k iops. >> Then add the small latency of the application servicing the io (NFS, Ceph, etc), >> and the latency introduced by your VM layer, and 5k sounds about right. >> >> The good news is that you probably aren't taxing the storage, you can likely >> do many simultaneous tests from several VMs and get the same results. >> >> You can try adding --numjobs to your fio to parallelize the specific test you're >> doing, or launching a second VM and doing the same test at the same time. >> This would be a good indicator if it's latency. >> >> On Wed, Oct 31, 2012 at 10:29 AM, Alexandre DERUMIER >> <aderumier@xxxxxxxxx> wrote: >> >>>Have you tried increasing the iodepth? >> > Yes, I have try with 100 and 200, same results. >> > >> > I have also try directly from the host, with /dev/rbd1, and I have same >> result. >> > I have also try with 3 differents hosts, with differents cpus models. >> > >> > (note: I can reach around 40.000 iops with same fio config on a zfs >> > iscsi array) >> > >> > My test ceph cluster nodes cpus are old (xeon E5420), but they are around >> 10% usage, so I think it's ok. >> > >> > >> > Do you have an idea if I can trace something ? >> > >> > Thanks, >> > >> > Alexandre >> > >> > ----- Mail original ----- >> > >> > De: "Sage Weil" <sage@xxxxxxxxxxx> >> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> >> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> >> > Envoyé: Mercredi 31 Octobre 2012 16:57:05 >> > Objet: Re: slow fio random read benchmark, need help >> > >> > On Wed, 31 Oct 2012, Alexandre DERUMIER wrote: >> >> Hello, >> >> >> >> I'm doing some tests with fio from a qemu 1.2 guest (virtio >> >> disk,cache=none), randread, with 4K block size on a small size of 1G >> >> (so it can be handle by the buffer cache on ceph cluster) >> >> >> >> >> >> fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M >> >> --iodepth=40 --group_reporting --name=file1 --ioengine=libaio >> >> --direct=1 >> >> >> >> >> >> I can't get more than 5000 iops. >> > >> > Have you tried increasing the iodepth? >> > >> > sage >> > >> >> >> >> >> >> RBD cluster is : >> >> --------------- >> >> 3 nodes,with each node : >> >> -6 x osd 15k drives (xfs), journal on tmpfs, 1 mon >> >> -cpu: 2x 4 cores intel xeon E5420@2.5GHZ rbd 0.53 >> >> >> >> ceph.conf >> >> >> >> journal dio = false >> >> filestore fiemap = false >> >> filestore flusher = false >> >> osd op threads = 24 >> >> osd disk threads = 24 >> >> filestore op threads = 6 >> >> >> >> kvm host is : 4 x 12 cores opteron >> >> ------------ >> >> >> >> >> >> During the bench: >> >> >> >> on ceph nodes: >> >> - cpu is around 10% used >> >> - iostat show no disks activity on osds. (so I think that the 1G file >> >> is handle in the linux buffer) >> >> >> >> >> >> on kvm host: >> >> >> >> -cpu is around 20% used >> >> >> >> >> >> I really don't see where is the bottleneck.... >> >> >> >> Any Ideas, hints ? >> >> >> >> >> >> Regards, >> >> >> >> Alexandre >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More >> majordomo >> >> info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> > -- >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More >> majordomo >> > info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the >> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at >> http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html