[PATCH RFC 0/1] KVM: ioctl for reading/writing guest memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



tl;dr:
This patch adds a new ioctl to KVM on s390x for reading and writing from/to
virtual guest memory, to take account of the so-called IPTE-lock on s390x
(a locking mechanism for the host to walk MMU tables of the guest).

Long story:
Certain instruction interception handlers in QEMU have to access the memory
of the guest, either to retrieve additional paramaters/data or to supply
results to the guest. On s390x, some of them (e.g. MSCH, SSCH, STSCH, ...)
are specified to use logical (i.e. virtual) addresses in memory, i.e. the
addresses are subject to MMU translation. The current handlers in
target-s390x/ioinst.c just work "by accident" since the Linux kernel on
s390x uses a 1:1 MMU mapping for kernel memory, but for correct behaviour
we have to do a MMU page table walk in these handlers first.

Now on s390x, there's another specialty for the case the host has to walk
the MMU tables of the guest: While doing the page table walk (or while
accessing the memory of the guest in bigger, non-atomic chunks on multiple
pages), there is a small chance that another CPU might zap or change the
MMU mappings inbetween, so in that case an unexpected/undefined behaviour
might occur. To avoid such problems, the SIE facility features a locking
mechanism, the so called IPTE-lock, which prevents other virtual CPUs from
issuing the IPTE (invalidate page table entry) or similar instructions.
When the lock is being held, these other instructions are intercepted, so
that the execution of the instructions can be delayed until the page table
walk / memory operation finished on the locking CPU.

The kernel part of KVM on s390x already uses this locking mechanism for
the interception handlers in the kernel (e.g. during the read_guest()
and write_guest() functions). For proper MMU page table walk support
in QEMU, the IPTE-lock has now somehow to be provided to the userspace,
too.

However, providing this lock directly to the userspace would be quite
ugly, since we then need to deal with a lot of cumbersome conditions
(how should the kernel behave if userspace takes the lock for too long
or forgets to free it again etc.). Additionally, there is also another
specialty of s390x pending - proper handling of the so-called storage
keys when accessing the guest memory - which is also done best in
the kernel space instead of user space (I can elaborate more on that
topic on request). So I decided to introduce a simple ioctl for reading
and writing from/to guest memory instead of exporting the lock itself
to userspace.

The userspace (QEMU) then can simply call this ioctl when it wants
to read or write from/to virtual guest memory. Then kernel then takes
the IPTE-lock, walks the MMU table of the guest to find out the
physical address that corresponds to the virtual address, copies
the requested amount of bytes from the userspace buffer to guest
memory or the other way round, and finally frees the IPTE-lock again.

Does that sound like a viable solution (IMHO it does ;-))? Or should
I maybe try to pursue another approach?

Thomas Huth (1):
  KVM: s390: Add MEMOP ioctls for reading/writing guest memory

 Documentation/virtual/kvm/api.txt |   44 +++++++++++++++++++++++++
 arch/s390/kvm/gaccess.c           |   22 +++++++++++++
 arch/s390/kvm/gaccess.h           |    2 +
 arch/s390/kvm/kvm-s390.c          |   63 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |   21 ++++++++++++
 5 files changed, 152 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux