Re: [PATCH v3 09/13] KVM: PPC: e500: enable magic page

Alexander Graf <agraf@xxxxxxx> · Wed, 15 Jun 2011 23:21:37 +0200

On 15.06.2011, at 22:58, Scott Wood wrote:

> On Wed, 15 Jun 2011 13:34:06 +0200
> Alexander Graf <agraf@xxxxxxx> wrote:
> 
>> 
>> On 15.06.2011, at 12:50, Alexander Graf wrote:
>>> What are your results when using the magic page? I have the following numbers with your patches applied:
>>> 
>>> == bare metal ==
>>> 
>>> root@e500:~/kvm# time for i in {1..1000}; do /bin/echo > /dev/null; done
>>> 
>>> real	0m5.445s
>>> user	0m0.204s
>>> sys	0m0.572s
>>> 
>>> 
>>> == no hypervisor node (magic page not used) ==
>>> 
>>> debian-powerpc:~# time for i in {1..1000}; do /bin/echo > /dev/null; done
>>> 
>>> real	1m36.362s
>>> user	0m13.224s
>>> sys	1m11.084s
>>> 
>>> 
>>> == with hypervisor node (magic page used) ==
>>> 
>>> debian-powerpc:~# time for i in {1..1000}; do /bin/echo > /dev/null; done
>>> 
>>> real	2m28.888s
>>> user	0m9.248s
>>> sys	1m4.016s
>> 
>> Interesting - now I'm down to:
>> 
>> debian-powerpc:~# time for i in {1..1000}; do /bin/echo > /dev/null; done
>> 
>> real	1m25.008s
>> user	0m9.224s
>> sys	1m5.720s
>> 
>> 
>> Oh well, let's hope I did something wrong before :).
> 
> Remember, I have more paravirt patches coming after this (wanted to get the
> MMU stuff dealt with first), and the kernel is still using 4K TLB1 pages in
> the default qemu config.  We should probably use TLB0 when large pages
> aren't available.
> 
> Without paravirt, no large pages:
> 
> sh-2.05b# time for i in $(seq 1000); do /bin/echo > /dev/null ; done
> 
> real    0m42.769s
> user    0m3.256s
> sys     0m34.988s
> 
> With paravirt including my local patches (but still no large pages):

Do these include patches to move the MAS registers to the shared page? That should reduce the instruction traps by a significant number.

> 
> sh-2.05b# time for i in $(seq 1000); do /bin/echo > /dev/null ; done
> 
> real    0m40.339s
> user    0m1.560s
> sys     0m32.652s
> 
> With large pages and no paravirt:
> 
> sh-2.05b# time for i in $(seq 1000); do /bin/echo > /dev/null ;done
> 
> real    0m7.986s

Wow, so this is where all the time gets wasted. Sounds like the guest's kernel eats up all of it. I assume "large pages" means direct map?

> user    0m2.528s
> sys     0m3.232s
> 
> With large pages and paravirt, but just this patchset (no further paravirt
> patches):
> 
> sh-2.05b# time for i in $(seq 1000); do /bin/echo > /dev/null ; done
> 
> real    0m6.067s
> user    0m3.068s
> sys     0m2.332s
> 
> With large pages and all my paravirt patches:

Mind to give me a list of patches that you have in the queue? Nothing fancy, just the instructions that you're already looking at.

> sh-2.05b# time for i in $(seq 1000); do /bin/echo  > /dev/null ;done
> 
> real    0m3.837s
> user    0m0.604s
> sys     0m0.316s
> 
> On the host (different rfs, but I think similar in relevant ways, except
> that the host rfs has SPE and guest rfs is soft-float):
> 
> # time for i in $(seq 1000); do /bin/echo > /dev/null ; done
> 
> real    0m1.850s
> user    0m0.028s
> sys     0m0.236s
> 
> I used seq because my rfs is using an older bash that doesn't seem to
> understand the range expression.

Sure, it's very valuable benchmarking data nevertheless! I just use the bash range thing because it's easier to type - and slightly faster.

Thanks a lot for these numbers.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html