Re: Guest sync write iops so poor.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 25 Feb 2016 1:47 pm, Jan Schermer <jan@xxxxxxxxxxx> wrote:


> On 25 Feb 2016, at 14:39, Nick Fisk <nick@xxxxxxxxxx> wrote: > > > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Huan Zhang >> Sent: 25 February 2016 11:11 >> To: josh.durgin@xxxxxxxxxxx >> Cc: ceph-users <ceph-users@xxxxxxxx> >> Subject: Guest sync write iops so poor. >> >> Hi, >>   We test sync iops with fio sync=1 for database workloads in VM, >> the backend is librbd and ceph (all SSD setup). >>   The result is sad to me. we only get ~400 IOPS sync randwrite with >> iodepth=1 >> to iodepth=32. >>   But test in physical machine with fio ioengine=rbd sync=1, we can reache >> ~35K IOPS. >> seems the qemu rbd is the bottleneck. >>   qemu version is 2.1.2 with rbd_aio_flush patched. >>    rbd cache is off, qemu cache=none. >> >> So what's wrong with it? Is that normal? Could you give me some help? > > Yes, this is normal at QD=1. As the write needs to be acknowledged by both replica OSD's across a network connection the round trip latency severely limits you as compared to travelling along a 30cm sata cable. > > The two biggest contributors to latency is the network and the speed at which the CPU can process the ceph code.  To improve performance look at these two areas first. Easy win is to disable debug logging in ceph. > > However this number should scale as you increase the QD, so something is not right if you are seeing the same performance at QD=1 as QD=32. Are you sure?

Ah, sorry. It's sync and not direct io. Yes you are right, it will not scale. 400 iops at all qd is correct.

Unless something (io elevator) coalesces the writes then they should be serialized and blocking, QD doesn't necessarily help there. Either way, you're benchmarking the elevator and not RBD if you reach higher IOPS with QD>1, IMO.

35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't actually work. Or it's not touching the same object (but I wonder whether write ordering is preserved at that rate?). 400 IOPS is sadly the same figure I can reach on a raw device... testing with filesystem you can easily reach <200 IOPS (because of journal, metadata... but again, then you're benchmarking filesystem journal and ioelevator efficiency, not RBD itself). Jan > >> Thanks very much. > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Quoting Jan Schermer <jan@xxxxxxxxxxx>


On 25 Feb 2016, at 14:39, Nick Fisk <nick@xxxxxxxxxx> wrote:



-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Huan Zhang
Sent: 25 February 2016 11:11
To: josh.durgin@xxxxxxxxxxx
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject:  Guest sync write iops so poor.

Hi,
  We test sync iops with fio sync=1 for database workloads in VM,
the backend is librbd and ceph (all SSD setup).
  The result is sad to me. we only get ~400 IOPS sync randwrite with
iodepth=1
to iodepth=32.
  But test in physical machine with fio ioengine=rbd sync=1, we can reache
~35K IOPS.
seems the qemu rbd is the bottleneck.
  qemu version is 2.1.2 with rbd_aio_flush patched.
   rbd cache is off, qemu cache=none.

So what's wrong with it? Is that normal? Could you give me some help?

Yes, this is normal at QD=1. As the write needs to be acknowledged by both replica OSD's across a network connection the round trip latency severely limits you as compared to travelling along a 30cm sata cable.

The two biggest contributors to latency is the network and the speed at which the CPU can process the ceph code. To improve performance look at these two areas first. Easy win is to disable debug logging in ceph.

However this number should scale as you increase the QD, so something is not right if you are seeing the same performance at QD=1 as QD=32.

Are you sure? Unless something (io elevator) coalesces the writes then they should be serialized and blocking, QD doesn't necessarily help there. Either way, you're benchmarking the elevator and not RBD if you reach higher IOPS with QD>1, IMO.

35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't actually work. Or it's not touching the same object (but I wonder whether write ordering is preserved at that rate?).

400 IOPS is sadly the same figure I can reach on a raw device... testing with filesystem you can easily reach <200 IOPS (because of journal, metadata... but again, then you're benchmarking filesystem journal and ioelevator efficiency, not RBD itself).

Jan



Thanks very much.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux