Re: speedup ceph / scaling / find the bottleneck

Stefan Priebe <s.priebe@xxxxxxxxxxxx> · Sun, 01 Jul 2012 23:01:44 +0200

Hello list,
 Hello sage,

i've made some further tests.

Sequential 4k writes over 200GB: 300% CPU usage of kvm process 34712 iops

Random 4k writes over 200GB: 170% CPU usage of kvm process 5500 iops

When i make random 4k writes over 100MB: 450% CPU usage of kvm process 
and !! 25059 iops !!

Random 4k writes over 1GB: 380% CPU usage of kvm process 14387 iops

So the range where the random I/O happen seem to be important and the 
cpu usage just seem to reflect the iops.

So i'm not sure if the problem is really the client rbd driver. Mark i 
hope you can make some tests next week.

Greets
Stefan

Am 29.06.2012 23:18, schrieb Stefan Priebe:
Am 29.06.2012 17:28, schrieb Sage Weil:
On Fri, 29 Jun 2012, Stefan Priebe - Profihost AG wrote:
Am 29.06.2012 13:49, schrieb Mark Nelson:
I'll try to replicate your findings in house.  I've got some other
things I have to do today, but hopefully I can take a look next
week. If
I recall correctly, in the other thread you said that sequential writes
are using much less CPU time on your systems?

Random 4k writes: 10% idle
Seq 4k writes: !! 99,7% !! idle
Seq 4M writes: 90% idle

I take it 'rbd cache = true'?
Yes

It sounds like librbd (or the guest file
system) is coalescing the sequential writes into big writes.  I'm a bit
surprised that the 4k ones have lower CPU utilization, but there are lots
of opportunity for noise there, so I would

n't read too far into it yet.
90 to 99,7 is OK the 9% goes to flush, kworker and xfs processes. It was
the overall system load. Not just ceph-osd.

  Do you see better scaling in that case?

3 osd nodes:
1 VM:
Rand 4k writes: 7000 iops
<-- this one is WRONG! sorry it is 14100 iops

Seq 4k writes: 19900 iops

2 VMs:
Rand 4k writes: 6000 iops each
Seq 4k writes: 4000 iops VM 1
Seq 4k writes: 18500 iops VM 2

4 osd nodes:
1 VM:
Rand 4k writes: 14400 iops      <------ ????

Can you double-check this number?
Triple checked BUT i see the the Rand 4k writes with 3 osd nodes was
wrong. Sorry.

Seq 4k writes: 19000 iops

2 VMs:
Rand 4k writes: 7000 iops each
Seq 4k writes: 18000 iops each

With the exception of that one number above, it really sounds like the
bottleneck is in the client (VM or librbd+librados) and not in the
cluster.  Performance won't improve when you add OSDs if the limiting
factor is the clients ability to dispatch/stream/sustatin IOs.  That also
seems concistent with the fact that limiting the # of CPUs on the OSDs
doesn't affect much.
ACK

Aboe, with 2 VMs, for instance, your total iops for the cluster doubled
(36000 total).  Can you try with 4 VMs and see if it continues to
scale in
that dimension?  At some point you will start to saturate the OSDs,
and at
that point adding more OSDs should show aggregate throughput going up.
 From where did you get that value? It scales to VMs on some points but
it does not scale with OSDs.

Stefan

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html