RE: slow fio random read benchmark, need help

Dietmar Maurer <dietmar@xxxxxxxxxxx> · Thu, 1 Nov 2012 07:38:14 +0000

I do not really understand that network latency argument.

If one can get 40K iops with iSCSI, why can't I get the same with rados/ceph?

Note: network latency is the same in both cases

What do I miss?

> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER
> Sent: Mittwoch, 31. Oktober 2012 18:27
> To: Marcus Sorensen
> Cc: Sage Weil; ceph-devel
> Subject: Re: slow fio random read benchmark, need help
> 
> Thanks Marcus,
> 
> indeed gigabit ethernet.
> 
> note that my iscsi results  (40k)was with multipath, so multiple gigabit links.
> 
> I have also done tests with a netapp array, with nfs, single link, I'm around
> 13000 iops
> 
> I will do more tests with multiples vms, from differents hosts, and with --
> numjobs.
> 
> I'll keep you in touch,
> 
> Thanks for help,
> 
> Regards,
> 
> Alexandre
> 
> 
> ----- Mail original -----
> 
> De: "Marcus Sorensen" <shadowsor@xxxxxxxxx>
> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
> Cc: "Sage Weil" <sage@xxxxxxxxxxx>, "ceph-devel" <ceph-
> devel@xxxxxxxxxxxxxxx>
> Envoyé: Mercredi 31 Octobre 2012 18:08:11
> Objet: Re: slow fio random read benchmark, need help
> 
> 5000 is actually really good, if you ask me. Assuming everything is connected
> via gigabit. If you get 40k iops locally, you add the latency of tcp, as well as
> that of the ceph services and VM layer, and that's what you get. On my
> network I get about a .1ms round trip on gigabit over the same switch, which
> by definition can only do 10,000 iops. Then if you have storage on the other
> end capable of 40k iops, you add the latencies together (.1ms + .025ms) and
> you're at 8k iops.
> Then add the small latency of the application servicing the io (NFS, Ceph, etc),
> and the latency introduced by your VM layer, and 5k sounds about right.
> 
> The good news is that you probably aren't taxing the storage, you can likely
> do many simultaneous tests from several VMs and get the same results.
> 
> You can try adding --numjobs to your fio to parallelize the specific test you're
> doing, or launching a second VM and doing the same test at the same time.
> This would be a good indicator if it's latency.
> 
> On Wed, Oct 31, 2012 at 10:29 AM, Alexandre DERUMIER
> <aderumier@xxxxxxxxx> wrote:
> >>>Have you tried increasing the iodepth?
> > Yes, I have try with 100 and 200, same results.
> >
> > I have also try directly from the host, with /dev/rbd1, and I have same
> result.
> > I have also try with 3 differents hosts, with differents cpus models.
> >
> > (note: I can reach around 40.000 iops with same fio config on a zfs
> > iscsi array)
> >
> > My test ceph cluster nodes cpus are old (xeon E5420), but they are around
> 10% usage, so I think it's ok.
> >
> >
> > Do you have an idea if I can trace something ?
> >
> > Thanks,
> >
> > Alexandre
> >
> > ----- Mail original -----
> >
> > De: "Sage Weil" <sage@xxxxxxxxxxx>
> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
> > Envoyé: Mercredi 31 Octobre 2012 16:57:05
> > Objet: Re: slow fio random read benchmark, need help
> >
> > On Wed, 31 Oct 2012, Alexandre DERUMIER wrote:
> >> Hello,
> >>
> >> I'm doing some tests with fio from a qemu 1.2 guest (virtio
> >> disk,cache=none), randread, with 4K block size on a small size of 1G
> >> (so it can be handle by the buffer cache on ceph cluster)
> >>
> >>
> >> fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M
> >> --iodepth=40 --group_reporting --name=file1 --ioengine=libaio
> >> --direct=1
> >>
> >>
> >> I can't get more than 5000 iops.
> >
> > Have you tried increasing the iodepth?
> >
> > sage
> >
> >>
> >>
> >> RBD cluster is :
> >> ---------------
> >> 3 nodes,with each node :
> >> -6 x osd 15k drives (xfs), journal on tmpfs, 1 mon
> >> -cpu: 2x 4 cores intel xeon E5420@2.5GHZ rbd 0.53
> >>
> >> ceph.conf
> >>
> >> journal dio = false
> >> filestore fiemap = false
> >> filestore flusher = false
> >> osd op threads = 24
> >> osd disk threads = 24
> >> filestore op threads = 6
> >>
> >> kvm host is : 4 x 12 cores opteron
> >> ------------
> >>
> >>
> >> During the bench:
> >>
> >> on ceph nodes:
> >> - cpu is around 10% used
> >> - iostat show no disks activity on osds. (so I think that the 1G file
> >> is handle in the linux buffer)
> >>
> >>
> >> on kvm host:
> >>
> >> -cpu is around 20% used
> >>
> >>
> >> I really don't see where is the bottleneck....
> >>
> >> Any Ideas, hints ?
> >>
> >>
> >> Regards,
> >>
> >> Alexandre
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f