QEMU and MMIO regions: ------------------------------------ I think I have solved the first hurdle regarding QEMU<->KVM io mappings. When QEMU registers a memory region with its virtual CPU which is never really allocated as user memory, the KVM interaction code sees such a memory region as IO_MEM_UNASSIGNED and never registers the memory region with KVM. However, KVM has already an API to register these type of memory regions through the IOCTL called KVM_SET_MEMORY_REGION (as opposed to KVM_SET_USER_MEMORY_REGION). This IOCTl must be implemented in the kernel per host architecture and handled correspondingly. I have done this by registering the memory regions as usual with KVM, but setting the existing user_alloc flag to 0 on the kvm_memory_slot structure. Thus on shadow page fault, instead of trying to translate the guest physical address to a host virtual address, I first find the corresponding memory slot and check if it has user_alloc set. In that case I translate to the host virtual address and add the mapping in the shadow page tables and otherwise, I regard the access as an MMIO operation. For the MMIO operation itself, it seems to be fairly straight forward just setting the vcpu->run.exit_reason = KVM_EXIT_MMIO and filling in the vcpu->run->mmio structure as well. However, the fields in the mmio struct are: - phys_addr - data - len - is_write To fill in this data (particularly data and len) I see no way out of analyzing the faulting instruction and emualting its behavior. Which leads me to the next point Emulation code re-factoring: -------------------------------------- There is quite a lot of well-working code in the arch/arm/kvm/arm_emulate.c file, which is currently used mainly for emulating co-processor and status-register instructions, but there is some support for load-store and data processing instructions in there as well. Unfortunately the code doesn't follow any kernel coding standards and is generally hard to read and follow. What's worse is that the code is not sufficiently modularized to be used to analyze instructions in the MMIO case. For these reasons I am trying to re-factor the code somewhat. I can think of these scenarios in which it will be relevant to emulate instructions: - When a sensitive instruction is executed and the guest is in priv. mode - Load-store instructions accessing MMIO regions - Load-store instructions accessing reserved regions such as the interrupt vector page. The sensitive instructions can be divided into these categories: - Data-processing instructions with S bit set and PC as destination (changes priv. mode) - Program status register instructions - Co-processor instructions - Load-store instructions that modify user-mode registers (when guest is in priv. mode) If anybody can think of other cases where we need to emulate instructions, now is the time to speak up :) The design I'm working on for this code is to separate the emulation code into small functions as much as possible and make this as state-independent as possible. For instance instead of simply passing the vcpu struct pointer to a function called emulate_instr, which then modified the state of the vcpu as it is now, there would be several smaller functions in style with the suggestion below. Any suggestions on improving this are most welcome: Emulation example code: ---------------------------------- u32 decode_ls_address(u32 instr); u32 get_store_value(u32 instr); u8 get_load_dest_register(u32 instr); u32 get_ls_instr_datalen(u32 instr); int emulate_ls_instr(u32 instr) { u32 addr = decode_ls_address(instr); hva_t hva = gva_to_hva(addr); switch (get_ls_instr_index(instr)) { case ARM_LS_INSTR_LDR: { char *dest = vcpu->regs + get_load_dest_register(instr); u32 len = get_ls_instr_datalen(instr); memcpy(dest, hva, len); break; } case ...... .... ... return 0; } Thanks, Christoffer