Re: Guest sync write iops so poor.

nick@xxxxxxxxxx · Thu, 25 Feb 2016 13:52:29 +0000

On 25 Feb 2016 1:47 pm, Jan Schermer <jan@xxxxxxxxxxx> wrote:

> On 25 Feb 2016, at 14:39, Nick Fisk <nick@xxxxxxxxxx> wrote: > > >  
>> -----Original Message----- >> From: ceph-users  
[mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Huan  
Zhang >> Sent: 25 February 2016 11:11 >> To: josh.durgin@xxxxxxxxxxx  
>> Cc: ceph-users <ceph-users@xxxxxxxx> >> Subject:   
Guest sync write iops so poor. >> >> Hi, >>   We test sync iops with  
fio sync=1 for database workloads in VM, >> the backend is librbd  
and ceph (all SSD setup). >>   The result is sad to me. we only get  
~400 IOPS sync randwrite with >> iodepth=1 >> to iodepth=32. >>    
But test in physical machine with fio ioengine=rbd sync=1, we can  
reache >> ~35K IOPS. >> seems the qemu rbd is the bottleneck. >>    
qemu version is 2.1.2 with rbd_aio_flush patched. >>    rbd cache is  
off, qemu cache=none. >> >> So what's wrong with it? Is that normal?  
Could you give me some help? > > Yes, this is normal at QD=1. As the  
write needs to be acknowledged by both replica OSD's across a  
network connection the round trip latency severely limits you as  
compared to travelling along a 30cm sata cable. > > The two biggest  
contributors to latency is the network and the speed at which the  
CPU can process the ceph code.  To improve performance look at these  
two areas first. Easy win is to disable debug logging in ceph. > >  
However this number should scale as you increase the QD, so  
something is not right if you are seeing the same performance at  
QD=1 as QD=32. Are you sure?

Ah, sorry. It's sync and not direct io. Yes you are right, it will not  
scale. 400 iops at all qd is correct.

Unless something (io elevator) coalesces the writes then they should  
be serialized and blocking, QD doesn't necessarily help there. Either  
way, you're benchmarking the elevator and not RBD if you reach higher  
IOPS with QD>1, IMO.

35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't  
actually work. Or it's not touching the same object (but I wonder  
whether write ordering is preserved at that rate?). 400 IOPS is  
sadly the same figure I can reach on a raw device... testing with  
filesystem you can easily reach <200 IOPS (because of journal,  
metadata... but again, then you're benchmarking filesystem journal  
and ioelevator efficiency, not RBD itself). Jan > >> Thanks very  
much. > > _______________________________________________ >  
ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx >  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Quoting Jan Schermer <jan@xxxxxxxxxxx>

On 25 Feb 2016, at 14:39, Nick Fisk <nick@xxxxxxxxxx> wrote:

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Huan Zhang
Sent: 25 February 2016 11:11
To: josh.durgin@xxxxxxxxxxx
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject:  Guest sync write iops so poor.

Hi,
  We test sync iops with fio sync=1 for database workloads in VM,
the backend is librbd and ceph (all SSD setup).
  The result is sad to me. we only get ~400 IOPS sync randwrite with
iodepth=1
to iodepth=32.
  But test in physical machine with fio ioengine=rbd sync=1, we can reache
~35K IOPS.
seems the qemu rbd is the bottleneck.
  qemu version is 2.1.2 with rbd_aio_flush patched.
   rbd cache is off, qemu cache=none.

So what's wrong with it? Is that normal? Could you give me some help?

Yes, this is normal at QD=1. As the write needs to be acknowledged  
by both replica OSD's across a network connection the round trip  
latency severely limits you as compared to travelling along a 30cm  
sata cable.

The two biggest contributors to latency is the network and the  
speed at which the CPU can process the ceph code.  To improve  
performance look at these two areas first. Easy win is to disable  
debug logging in ceph.

However this number should scale as you increase the QD, so  
something is not right if you are seeing the same performance at  
QD=1 as QD=32.

Are you sure? Unless something (io elevator) coalesces the writes  
then they should be serialized and blocking, QD doesn't necessarily  
help there. Either way, you're benchmarking the elevator and not RBD  
if you reach higher IOPS with QD>1, IMO.

35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't  
actually work. Or it's not touching the same object (but I wonder  
whether write ordering is preserved at that rate?).

400 IOPS is sadly the same figure I can reach on a raw device...  
testing with filesystem you can easily reach <200 IOPS (because of  
journal, metadata... but again, then you're benchmarking filesystem  
journal and ioelevator efficiency, not RBD itself).

Jan

Thanks very much.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com