Shadow page tables, shared page and paravirtualization

cd2436 at columbia.edu (Christoffer Dall) · Tue, 29 Sep 2009 11:51:41 -0400

After the summer break I am again working on the KVM project.

To look at code and commit messages there's now a cgit web interface to
browse the repositories: http://git.chazy.dk.
The current work is being done in the android repository under the branch
"android-goldfish-2.6.27-kvmrun".

Shadow page tables:
----------------------------
There is new functionality in arch/arm/kvm/arm_mmu.c to manage shadow page
tables and to do page table walks to translate from one address space to the
other. We no longer use Linux's page table management functions due to page
table folding and duplicated 2nd level descriptors. The core design ideas
for this work was:
 - Use as much from the architecture-agnostic part as possible (found in
virt/kvm/kvm_main.c)
 - Create a master shadow page table, which maps in a shared page and
interrupt vector page (see below)
 - Allocate 4 second-level page tables per page (page size is 4K and
second-level tables are 1K) to minimize size per shadow page table,

Shared page:
------------------
The guest switch code has been completely rewritten along with the way to
hijack interrupts. Most important:
 - arch/arm/kvm/arm_interrupts.S contains assembly code written for the
interrupt vector page and can be used directly.
 - The shared page code in arm_interrupts.S is completely
position-independent and can thus be easily relocated to shared page virtual
address.
 - Interrupts are thus only hijacked when the shadow page tables are used
simply by mapping in a different vector page.
 - A call to vcpu->arch.run(), which points to the entry point in the
relocated shared page, will cause some execution of the guest until a
trap/interrupt and the return value will indicate the interrupt.
Consequently we can have a loop around this function in the same way as QEMU
does the KVM ioctl call.

Paravirtualization:
------------------------
It turned out that the relocation of the kernel during the guest boot would
cause binary patched instructions to be relocated and there was no apparent
easy way to track this process cleanly.
Instead I have written a small python script, which will go through the
kernel source and patch the source files in the following way:
 - On every occurrence of a sensitive instruction, an SWI instruction with
an identifiable code is inserted.
 - On every such trap, the KVM code can examine the following instruction,
emulate it and fast forward the PC with 4 bytes (skipping the sensitive
instruction).
 - The patching code supports assembly files and inline assembly in C files.
 - If there is a label associated with the sensitive assembly instruction
line, the label is moved to the SWI instruction.

I am curious if anyone can see a problem with the paravirtualized approach
mentioned above. My only concern is that if someone counted on the number of
instructions somewhere and did PC-relative addressing, there would be an
issue, but I doubt we will ever see code like that in the Linux kernel.

Best,
Christoffer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.cs.columbia.edu/pipermail/android-virt/attachments/20090929/b5540053/attachment-0001.html