Working guest!

cd2436 at columbia.edu (Christoffer Dall) · Mon, 4 Jan 2010 13:00:17 -0500

On Mon, Jan 4, 2010 at 1:43 AM, Alexander Graf <agraf at suse.de> wrote:

>
> On 03.01.2010, at 22:56, Christoffer Dall wrote:
>
>
>
> On Sun, Jan 3, 2010 at 7:50 AM, Alexander Graf <agraf at suse.de> wrote:
>
>>
>> On 03.01.2010, at 10:29, Christoffer Dall wrote:
>>
>> > I finally managed to get a working prompt on the guest. It's not too
>> quick though. An ls operation takes around 15 seconds and it takes about 5
>> minutes to boot the guest. Compared to QEMU emulation, which takes around 35
>> minutes it's an improvement, but of course not usable.
>>
>> Wow, congratulations!
>>
>> > Just wanted to give a quick follow-up on the latest e-mails as well:
>> >
>> >  - I changed QEMU to synchronize enough registers to give backtraces
>> during guest execution which was a big help for debugging.
>> >  - The console was not created because the device ID's were incorrectly
>> read, because there was a bug in the emulation code.
>> >  - Running the init program introduced some challenges with copy_to_user
>> (and related), since they use some special load with translation
>> instructions on ARM.
>> >  - Switching to user space introduced a whole new set of problems with
>> domains and access permissions, which essentially requires me to keep around
>> two shadow page tables per process or do a lot of updating of access
>> permissions when the guest switches cpu mode.
>> >  - I fixed interrupt injection for aborts where I updated a fault
>> register for both instruction prefetch aborts and data aborts, which broke
>> the guest handler.
>> >  - Finally I made some performance improvements in the world-switch code
>> to shorten my debug cycle.
>> >
>> > I'm probably going to take a small break from the development work (like
>> three weeks or so) while I relocate back to Denmark. Afterwards the plans
>> with the project are (in order):
>> >  - Improve performance
>>
>> There are two things that would come in handy to figure out what's slowing
>> things down:
>>
>> 1) exit stats
>>
>> There's an array called debugfs_entries where you just put in variables
>> that are monotonically increasing. That works out perfectly fine for things
>> like "guest exited n times due to page fault" kind of information.
>>
>> You can then read those values using the kvm_stat script
>>
>> That should give you a pretty good overview on why the guest exits so
>> often.
>>
>>
>> 2) exit timings
>>
>> Christian implemented a nice tracing framework to measure how much time
>> was spent in the hypervisor due to different exits. That's a lot more
>> accurate than simple exit numbers, because there's a good chance the MMU
>> code takes 50x longer than a simple privileged register read.
>>
>> The code is in arch/powerpc/kvm/timing.c.
>>
>
> Awesome. I contemplated implementing exactly the two things above, but
> figured there might be something out there already. Good to know where to
> begin.
>
>
> As a sidenote - how do you find privileged instructions that you can't trap
> on?
>

For this development I used a Python script to patch the linux kernel
source-code and had to change two or three files manually that used
PC-relative addressing. So, why not paravirtualize? Well, the idea is to
patch the instructions in binary on a 1-to-1 basis as written in the WIKI
and hope that it won't be too bad with the performance. As you say, if we
assume that the kernel code is not super self-modifying we should be OK. (We
can also catch instruction cache flushed and redo our-patching at that
point). The source patching was just easier to debug, since I could follow
branches with GDB instead of a system call that would take me somewhere
mysteriously :)

>
> Your wiki states that you're basically trying to stop at every jump/call.
> That sounds pretty slow. Are you still doing that?
>
> Also, can your ARM MMU do NX? On Book3S I simply scanned pages for bad
> instructions on the NX trap and always set NX=1 for all pages. Chances that
> the OS modifies itself with weird instructions after it executed a page are
> pretty low. Having data and code overlapping in a kernel page isn't that
> common either (FWIW).
>

ARM v6 and up has the NX flags, but I've based my work on the Android
emulator for now, which is v5. Next step is to make the stuff work on ARM
v6. v6 offers physically tagged caches, so we can avoid cache flushes on
page table switch during traps and ASIDs, so we can avoid TLB flushing.
Since v6 has the NX flags, that might be a really good approach for the
binary patching.

It's likely going to end up so, that I discard the v5 support as it'll be
too slow and move the whole thing to v6 and up as the minimum supported
arch.

>
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.cs.columbia.edu/pipermail/android-virt/attachments/20100104/2b8434d9/attachment.html