On 25 Feb 2016 1:47 pm, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> On 25 Feb 2016, at 14:39, Nick Fisk <nick@xxxxxxxxxx> wrote: > > >
>> -----Original Message----- >> From: ceph-users
[mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Huan
Zhang >> Sent: 25 February 2016 11:11 >> To: josh.durgin@xxxxxxxxxxx
>> Cc: ceph-users <ceph-users@xxxxxxxx> >> Subject:
Guest sync write iops so poor. >> >> Hi, >> We test sync iops with
fio sync=1 for database workloads in VM, >> the backend is librbd
and ceph (all SSD setup). >> The result is sad to me. we only get
~400 IOPS sync randwrite with >> iodepth=1 >> to iodepth=32. >>
But test in physical machine with fio ioengine=rbd sync=1, we can
reache >> ~35K IOPS. >> seems the qemu rbd is the bottleneck. >>
qemu version is 2.1.2 with rbd_aio_flush patched. >> rbd cache is
off, qemu cache=none. >> >> So what's wrong with it? Is that normal?
Could you give me some help? > > Yes, this is normal at QD=1. As the
write needs to be acknowledged by both replica OSD's across a
network connection the round trip latency severely limits you as
compared to travelling along a 30cm sata cable. > > The two biggest
contributors to latency is the network and the speed at which the
CPU can process the ceph code. To improve performance look at these
two areas first. Easy win is to disable debug logging in ceph. > >
However this number should scale as you increase the QD, so
something is not right if you are seeing the same performance at
QD=1 as QD=32. Are you sure?
Ah, sorry. It's sync and not direct io. Yes you are right, it will not
scale. 400 iops at all qd is correct.
Unless something (io elevator) coalesces the writes then they should
be serialized and blocking, QD doesn't necessarily help there. Either
way, you're benchmarking the elevator and not RBD if you reach higher
IOPS with QD>1, IMO.
35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't
actually work. Or it's not touching the same object (but I wonder
whether write ordering is preserved at that rate?). 400 IOPS is
sadly the same figure I can reach on a raw device... testing with
filesystem you can easily reach <200 IOPS (because of journal,
metadata... but again, then you're benchmarking filesystem journal
and ioelevator efficiency, not RBD itself). Jan > >> Thanks very
much. > > _______________________________________________ >
ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx >
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Quoting Jan Schermer <jan@xxxxxxxxxxx>
On 25 Feb 2016, at 14:39, Nick Fisk <nick@xxxxxxxxxx> wrote:
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Huan Zhang
Sent: 25 February 2016 11:11
To: josh.durgin@xxxxxxxxxxx
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject: Guest sync write iops so poor.
Hi,
We test sync iops with fio sync=1 for database workloads in VM,
the backend is librbd and ceph (all SSD setup).
The result is sad to me. we only get ~400 IOPS sync randwrite with
iodepth=1
to iodepth=32.
But test in physical machine with fio ioengine=rbd sync=1, we can reache
~35K IOPS.
seems the qemu rbd is the bottleneck.
qemu version is 2.1.2 with rbd_aio_flush patched.
rbd cache is off, qemu cache=none.
So what's wrong with it? Is that normal? Could you give me some help?
Yes, this is normal at QD=1. As the write needs to be acknowledged
by both replica OSD's across a network connection the round trip
latency severely limits you as compared to travelling along a 30cm
sata cable.
The two biggest contributors to latency is the network and the
speed at which the CPU can process the ceph code. To improve
performance look at these two areas first. Easy win is to disable
debug logging in ceph.
However this number should scale as you increase the QD, so
something is not right if you are seeing the same performance at
QD=1 as QD=32.
Are you sure? Unless something (io elevator) coalesces the writes
then they should be serialized and blocking, QD doesn't necessarily
help there. Either way, you're benchmarking the elevator and not RBD
if you reach higher IOPS with QD>1, IMO.
35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't
actually work. Or it's not touching the same object (but I wonder
whether write ordering is preserved at that rate?).
400 IOPS is sadly the same figure I can reach on a raw device...
testing with filesystem you can easily reach <200 IOPS (because of
journal, metadata... but again, then you're benchmarking filesystem
journal and ioelevator efficiency, not RBD itself).
Jan
Thanks very much.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com