tl;dr: This patch adds a new ioctl to KVM on s390x for reading and writing from/to virtual guest memory, to take account of the so-called IPTE-lock on s390x (a locking mechanism for the host to walk MMU tables of the guest). Long story: Certain instruction interception handlers in QEMU have to access the memory of the guest, either to retrieve additional paramaters/data or to supply results to the guest. On s390x, some of them (e.g. MSCH, SSCH, STSCH, ...) are specified to use logical (i.e. virtual) addresses in memory, i.e. the addresses are subject to MMU translation. The current handlers in target-s390x/ioinst.c just work "by accident" since the Linux kernel on s390x uses a 1:1 MMU mapping for kernel memory, but for correct behaviour we have to do a MMU page table walk in these handlers first. Now on s390x, there's another specialty for the case the host has to walk the MMU tables of the guest: While doing the page table walk (or while accessing the memory of the guest in bigger, non-atomic chunks on multiple pages), there is a small chance that another CPU might zap or change the MMU mappings inbetween, so in that case an unexpected/undefined behaviour might occur. To avoid such problems, the SIE facility features a locking mechanism, the so called IPTE-lock, which prevents other virtual CPUs from issuing the IPTE (invalidate page table entry) or similar instructions. When the lock is being held, these other instructions are intercepted, so that the execution of the instructions can be delayed until the page table walk / memory operation finished on the locking CPU. The kernel part of KVM on s390x already uses this locking mechanism for the interception handlers in the kernel (e.g. during the read_guest() and write_guest() functions). For proper MMU page table walk support in QEMU, the IPTE-lock has now somehow to be provided to the userspace, too. However, providing this lock directly to the userspace would be quite ugly, since we then need to deal with a lot of cumbersome conditions (how should the kernel behave if userspace takes the lock for too long or forgets to free it again etc.). Additionally, there is also another specialty of s390x pending - proper handling of the so-called storage keys when accessing the guest memory - which is also done best in the kernel space instead of user space (I can elaborate more on that topic on request). So I decided to introduce a simple ioctl for reading and writing from/to guest memory instead of exporting the lock itself to userspace. The userspace (QEMU) then can simply call this ioctl when it wants to read or write from/to virtual guest memory. Then kernel then takes the IPTE-lock, walks the MMU table of the guest to find out the physical address that corresponds to the virtual address, copies the requested amount of bytes from the userspace buffer to guest memory or the other way round, and finally frees the IPTE-lock again. Does that sound like a viable solution (IMHO it does ;-))? Or should I maybe try to pursue another approach? Thomas Huth (1): KVM: s390: Add MEMOP ioctls for reading/writing guest memory Documentation/virtual/kvm/api.txt | 44 +++++++++++++++++++++++++ arch/s390/kvm/gaccess.c | 22 +++++++++++++ arch/s390/kvm/gaccess.h | 2 + arch/s390/kvm/kvm-s390.c | 63 +++++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 21 ++++++++++++ 5 files changed, 152 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html