Re: slow fio random read benchmark, need help

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 1 Nov 2012 11:40:18 +0100

I'm not sure that latency addition is quite correct. Most use cases
cases do multiple IOs at the same time, and good benchmarks tend to
reflect that.

I suspect the IO limitations here are a result of QEMU's storage
handling (or possibly our client layer) more than anything else — Josh
can talk about that more than I can, though!
-Greg

On Thu, Nov 1, 2012 at 8:38 AM, Dietmar Maurer <dietmar@xxxxxxxxxxx> wrote:
> I do not really understand that network latency argument.
>
> If one can get 40K iops with iSCSI, why can't I get the same with rados/ceph?
>
> Note: network latency is the same in both cases
>
> What do I miss?
>
>> -----Original Message-----
>> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER
>> Sent: Mittwoch, 31. Oktober 2012 18:27
>> To: Marcus Sorensen
>> Cc: Sage Weil; ceph-devel
>> Subject: Re: slow fio random read benchmark, need help
>>
>> Thanks Marcus,
>>
>> indeed gigabit ethernet.
>>
>> note that my iscsi results  (40k)was with multipath, so multiple gigabit links.
>>
>> I have also done tests with a netapp array, with nfs, single link, I'm around
>> 13000 iops
>>
>> I will do more tests with multiples vms, from differents hosts, and with --
>> numjobs.
>>
>> I'll keep you in touch,
>>
>> Thanks for help,
>>
>> Regards,
>>
>> Alexandre
>>
>>
>> ----- Mail original -----
>>
>> De: "Marcus Sorensen" <shadowsor@xxxxxxxxx>
>> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
>> Cc: "Sage Weil" <sage@xxxxxxxxxxx>, "ceph-devel" <ceph-
>> devel@xxxxxxxxxxxxxxx>
>> Envoyé: Mercredi 31 Octobre 2012 18:08:11
>> Objet: Re: slow fio random read benchmark, need help
>>
>> 5000 is actually really good, if you ask me. Assuming everything is connected
>> via gigabit. If you get 40k iops locally, you add the latency of tcp, as well as
>> that of the ceph services and VM layer, and that's what you get. On my
>> network I get about a .1ms round trip on gigabit over the same switch, which
>> by definition can only do 10,000 iops. Then if you have storage on the other
>> end capable of 40k iops, you add the latencies together (.1ms + .025ms) and
>> you're at 8k iops.
>> Then add the small latency of the application servicing the io (NFS, Ceph, etc),
>> and the latency introduced by your VM layer, and 5k sounds about right.
>>
>> The good news is that you probably aren't taxing the storage, you can likely
>> do many simultaneous tests from several VMs and get the same results.
>>
>> You can try adding --numjobs to your fio to parallelize the specific test you're
>> doing, or launching a second VM and doing the same test at the same time.
>> This would be a good indicator if it's latency.
>>
>> On Wed, Oct 31, 2012 at 10:29 AM, Alexandre DERUMIER
>> <aderumier@xxxxxxxxx> wrote:
>> >>>Have you tried increasing the iodepth?
>> > Yes, I have try with 100 and 200, same results.
>> >
>> > I have also try directly from the host, with /dev/rbd1, and I have same
>> result.
>> > I have also try with 3 differents hosts, with differents cpus models.
>> >
>> > (note: I can reach around 40.000 iops with same fio config on a zfs
>> > iscsi array)
>> >
>> > My test ceph cluster nodes cpus are old (xeon E5420), but they are around
>> 10% usage, so I think it's ok.
>> >
>> >
>> > Do you have an idea if I can trace something ?
>> >
>> > Thanks,
>> >
>> > Alexandre
>> >
>> > ----- Mail original -----
>> >
>> > De: "Sage Weil" <sage@xxxxxxxxxxx>
>> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
>> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
>> > Envoyé: Mercredi 31 Octobre 2012 16:57:05
>> > Objet: Re: slow fio random read benchmark, need help
>> >
>> > On Wed, 31 Oct 2012, Alexandre DERUMIER wrote:
>> >> Hello,
>> >>
>> >> I'm doing some tests with fio from a qemu 1.2 guest (virtio
>> >> disk,cache=none), randread, with 4K block size on a small size of 1G
>> >> (so it can be handle by the buffer cache on ceph cluster)
>> >>
>> >>
>> >> fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M
>> >> --iodepth=40 --group_reporting --name=file1 --ioengine=libaio
>> >> --direct=1
>> >>
>> >>
>> >> I can't get more than 5000 iops.
>> >
>> > Have you tried increasing the iodepth?
>> >
>> > sage
>> >
>> >>
>> >>
>> >> RBD cluster is :
>> >> ---------------
>> >> 3 nodes,with each node :
>> >> -6 x osd 15k drives (xfs), journal on tmpfs, 1 mon
>> >> -cpu: 2x 4 cores intel xeon E5420@2.5GHZ rbd 0.53
>> >>
>> >> ceph.conf
>> >>
>> >> journal dio = false
>> >> filestore fiemap = false
>> >> filestore flusher = false
>> >> osd op threads = 24
>> >> osd disk threads = 24
>> >> filestore op threads = 6
>> >>
>> >> kvm host is : 4 x 12 cores opteron
>> >> ------------
>> >>
>> >>
>> >> During the bench:
>> >>
>> >> on ceph nodes:
>> >> - cpu is around 10% used
>> >> - iostat show no disks activity on osds. (so I think that the 1G file
>> >> is handle in the linux buffer)
>> >>
>> >>
>> >> on kvm host:
>> >>
>> >> -cpu is around 20% used
>> >>
>> >>
>> >> I really don't see where is the bottleneck....
>> >>
>> >> Any Ideas, hints ?
>> >>
>> >>
>> >> Regards,
>> >>
>> >> Alexandre
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
>> majordomo
>> >> info at http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
>> majordomo
>> > info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
>> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html