Working guest!

agraf at suse.de (Alexander Graf) · Sun, 3 Jan 2010 13:50:42 +0100

On 03.01.2010, at 10:29, Christoffer Dall wrote:

> I finally managed to get a working prompt on the guest. It's not too quick though. An ls operation takes around 15 seconds and it takes about 5 minutes to boot the guest. Compared to QEMU emulation, which takes around 35 minutes it's an improvement, but of course not usable.

Wow, congratulations!

> Just wanted to give a quick follow-up on the latest e-mails as well:
> 
>  - I changed QEMU to synchronize enough registers to give backtraces during guest execution which was a big help for debugging.
>  - The console was not created because the device ID's were incorrectly read, because there was a bug in the emulation code.
>  - Running the init program introduced some challenges with copy_to_user (and related), since they use some special load with translation instructions on ARM. 
>  - Switching to user space introduced a whole new set of problems with domains and access permissions, which essentially requires me to keep around two shadow page tables per process or do a lot of updating of access permissions when the guest switches cpu mode.
>  - I fixed interrupt injection for aborts where I updated a fault register for both instruction prefetch aborts and data aborts, which broke the guest handler.
>  - Finally I made some performance improvements in the world-switch code to shorten my debug cycle.
> 
> I'm probably going to take a small break from the development work (like three weeks or so) while I relocate back to Denmark. Afterwards the plans with the project are (in order):
>  - Improve performance

There are two things that would come in handy to figure out what's slowing things down:

1) exit stats

There's an array called debugfs_entries where you just put in variables that are monotonically increasing. That works out perfectly fine for things like "guest exited n times due to page fault" kind of information.

You can then read those values using the kvm_stat script

That should give you a pretty good overview on why the guest exits so often.

2) exit timings

Christian implemented a nice tracing framework to measure how much time was spent in the hypervisor due to different exits. That's a lot more accurate than simple exit numbers, because there's a good chance the MMU code takes 50x longer than a simple privileged register read.

The code is in arch/powerpc/kvm/timing.c.

Alex