Working guest!

agraf at suse.de (Alexander Graf) · Mon, 4 Jan 2010 07:43:05 +0100

On 03.01.2010, at 22:56, Christoffer Dall wrote:

> 
> 
> On Sun, Jan 3, 2010 at 7:50 AM, Alexander Graf <agraf at suse.de> wrote:
> 
> On 03.01.2010, at 10:29, Christoffer Dall wrote:
> 
> > I finally managed to get a working prompt on the guest. It's not too quick though. An ls operation takes around 15 seconds and it takes about 5 minutes to boot the guest. Compared to QEMU emulation, which takes around 35 minutes it's an improvement, but of course not usable.
> 
> Wow, congratulations!
> 
> > Just wanted to give a quick follow-up on the latest e-mails as well:
> >
> >  - I changed QEMU to synchronize enough registers to give backtraces during guest execution which was a big help for debugging.
> >  - The console was not created because the device ID's were incorrectly read, because there was a bug in the emulation code.
> >  - Running the init program introduced some challenges with copy_to_user (and related), since they use some special load with translation instructions on ARM.
> >  - Switching to user space introduced a whole new set of problems with domains and access permissions, which essentially requires me to keep around two shadow page tables per process or do a lot of updating of access permissions when the guest switches cpu mode.
> >  - I fixed interrupt injection for aborts where I updated a fault register for both instruction prefetch aborts and data aborts, which broke the guest handler.
> >  - Finally I made some performance improvements in the world-switch code to shorten my debug cycle.
> >
> > I'm probably going to take a small break from the development work (like three weeks or so) while I relocate back to Denmark. Afterwards the plans with the project are (in order):
> >  - Improve performance
> 
> There are two things that would come in handy to figure out what's slowing things down:
> 
> 1) exit stats
> 
> There's an array called debugfs_entries where you just put in variables that are monotonically increasing. That works out perfectly fine for things like "guest exited n times due to page fault" kind of information.
> 
> You can then read those values using the kvm_stat script
> 
> That should give you a pretty good overview on why the guest exits so often.
> 
> 
> 2) exit timings
> 
> Christian implemented a nice tracing framework to measure how much time was spent in the hypervisor due to different exits. That's a lot more accurate than simple exit numbers, because there's a good chance the MMU code takes 50x longer than a simple privileged register read.
> 
> The code is in arch/powerpc/kvm/timing.c.
> 
> Awesome. I contemplated implementing exactly the two things above, but figured there might be something out there already. Good to know where to begin. 

As a sidenote - how do you find privileged instructions that you can't trap on?

Your wiki states that you're basically trying to stop at every jump/call. That sounds pretty slow. Are you still doing that?

Also, can your ARM MMU do NX? On Book3S I simply scanned pages for bad instructions on the NX trap and always set NX=1 for all pages. Chances that the OS modifies itself with weird instructions after it executed a page are pretty low. Having data and code overlapping in a kernel page isn't that common either (FWIW).

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.cs.columbia.edu/pipermail/android-virt/attachments/20100104/1520ff1e/attachment-0001.html