On 2012-02-13 19:50, Blue Swirl wrote: > On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@xxxxxxxxxxx> wrote: >> On 2012-02-11 16:25, Blue Swirl wrote: >>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@xxxxxxxxxxx> wrote: >>>> This enables acceleration for MMIO-based TPR registers accesses of >>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled, >>>> either on older Intel CPUs (without flexpriority feature, can also be >>>> manually disabled for testing) or any current AMD processor. >>>> >>>> The approach introduced here is derived from the original version of >>>> qemu-kvm. It was refactored, documented, and extended by support for >>>> user space APIC emulation, both with and without KVM acceleration. The >>>> VMState format was kept compatible, so was the ABI to the option ROM >>>> that implements the guest-side para-virtualized driver service. This >>>> enables seamless migration from qemu-kvm to upstream or, one day, >>>> between KVM and TCG mode. >>>> >>>> The basic concept goes like this: >>>> - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel >>>> irqchip) a vmcall hypercall is registered >>>> - VAPIC option ROM is loaded into guest >>>> - option ROM activates TPR MMIO access reporting via port 0x7e >>>> - TPR accesses are trapped and patched in the guest to call into option >>>> ROM instead, VAPIC support is enabled >>>> - option ROM TPR helpers track state in memory and invoke hypercall to >>>> poll for pending IRQs if required >>>> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> >>> >>> I must say that I find the approach horrible, patching guests and ROMs >>> and looking up Windows internals. Taking the same approach to extreme, >>> we could for example patch Xen guest to become a KVM guest. Not that I >>> object merging. >> >> Yes, this is horrible. But there is no real better way in the absence of >> hardware assisted virtualization of the TPR. I think MS is recommending >> this patching approach as well. > > Maybe instead of routing via ROM and the hypercall, the TPR accesses > could be handled directly with guest invisible breakpoints (like GDB > breakpoints, but for QEMU internal use), much like other > instrumentation could be handled. Gleb answered it already. >>>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev) >>>> { >>>> APICCommonState *s = APIC_COMMON(dev); >>>> APICCommonClass *info; >>>> + static DeviceState *vapic; >>>> static int apic_no; >>>> >>>> if (apic_no >= MAX_APICS) { >>>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev) >>>> info = APIC_COMMON_GET_CLASS(s); >>>> info->init(s); >>>> >>>> - sysbus_init_mmio(&s->busdev, &s->io_memory); >>>> + sysbus_init_mmio(dev, &s->io_memory); >>>> + >>>> + if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) { >>>> + vapic = sysbus_create_simple("kvmvapic", -1, NULL); >>>> + } >>>> + s->vapic = vapic; >>>> + if (apic_report_tpr_access && info->enable_tpr_reporting) { >>> >>> I think you should not rely on apic_report_tpr_access being in sane >>> condition during class init. >> >> It is mandatory, e.g. for CPU hotplug, as reporting needs to be >> consistent accross all VCPUs. Therefore it is a static global, set to >> false initially. However, you are right, we lack proper clearing of the >> access report feature on reset, not only in this variable. > > I'd also set it to false initially. It's a global variable, thus initialized to false by definition. >>>> + >>>> +#define VAPIC_CPU_SHIFT 7 >>>> + >>>> +#define ROM_BLOCK_SIZE 512 >>>> +#define ROM_BLOCK_MASK (~(ROM_BLOCK_SIZE - 1)) >>>> + >>>> +typedef struct VAPICHandlers { >>>> + uint32_t set_tpr; >>>> + uint32_t set_tpr_eax; >>>> + uint32_t get_tpr[8]; >>>> + uint32_t get_tpr_stack; >>>> +} QEMU_PACKED VAPICHandlers; >>>> + >>>> +typedef struct GuestROMState { >>>> + char signature[8]; >>>> + uint32_t vaddr; >>> >>> This does not look 64 bit clean. >> >> It's packed. > > I meant "virtual address could be 64 bits on a 64 bit host", not > structure packing. This is for 32-bit guests only. 64-bit Windows doesn't access the TPR via MMIO, thus is not activating the VAPIC. >>>> + uint32_t state; >>>> + uint32_t rom_state_paddr; >>>> + uint32_t rom_state_vaddr; >>>> + uint32_t vapic_paddr; >>>> + uint32_t real_tpr_addr; >>>> + GuestROMState rom_state; >>>> + size_t rom_size; >>>> +} VAPICROMState; >>>> + >>>> +#define TPR_INSTR_IS_WRITE 0x1 >>>> +#define TPR_INSTR_ABS_MODRM 0x2 >>>> +#define TPR_INSTR_MATCH_MODRM_REG 0x4 >>>> + >>>> +typedef struct TPRInstruction { >>>> + uint8_t opcode; >>>> + uint8_t modrm_reg; >>>> + unsigned int flags; >>>> + size_t length; >>>> + off_t addr_offset; >>>> +} TPRInstruction; >>> >>> Also here the order is pessimized. >> >> Don't see the gain here, though. > > There are two bytes' hole between modrm_reg and flags, maybe also 4 > bytes between length and addr_offset (if size_t is 32 bits but off_t > 64 bits). I'd reverse the order so that members with largest alignment > needs come first. Well, but this won't make the struct smaller. I prefer to keep the ordering in which we also initialize it. > >>>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env) >>>> +{ >>>> + target_phys_addr_t paddr; >>>> + target_ulong addr; >>>> + >>>> + if (s->state == VAPIC_ACTIVE) { >>>> + return 0; >>>> + } >>>> + for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) { >>>> + paddr = cpu_get_phys_page_debug(env, addr); >>>> + if (paddr != APIC_DEFAULT_ADDRESS) { >>>> + continue; >>>> + } >>>> + s->real_tpr_addr = addr + 0x80; >>>> + update_guest_rom_state(s); >>>> + return 0; >>>> + } >>> >>> This loop looks odd, what should it do, probe for unused address? >> >> Seems to deserve a comment: We have to scan for the guest's mapping of >> the APIC as we enter here without a hint from an TPR accessing >> instruction. So we probe the potential range, trying to find the page >> that maps to that known physical address (known in the sense that >> Windows does not remap the APIC physically - nor does QEMU support that >> so far). > > Yes, more comments would be nice, especially on theory of operation. > >>>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env, >>>> + target_ulong *pip, int access) >>>> +{ >>>> + const TPRInstruction *instr; >>>> + target_ulong ip = *pip; >>>> + uint8_t opcode[2]; >>>> + uint32_t real_tpr_addr; >>>> + int i; >>>> + >>>> + if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) { >>> >>> The constants should be using ULL suffix because target_ulong could be >>> 64 bit, though maybe this is more optimal. >> >> target_ulong is 64-bit unconditionally on x86. I'll add this. > > No, target_phys_addr_t is now 64 bits, but target_ulong (register > size) is 32 bits for i386-softmmu. Ah, right. > >>>> + >>>> +/* >>>> + * Tries to read the unique processor number from the Kernel Processor Control >>>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed >>>> + * or is considered invalid. >>>> + */ >>> >>> Horrible hack. Is guest OS type or version checked somewhere? >> >> This is all about hacking Windows 32-bit. And this check encodes that >> even stronger. The other important binding is the expected virtual >> address of the ROM mapping under Windows. I would have preferred >> checking the version directly, but no one has a complete list of >> supported guests and their codes. > > Then it would be nice to only enable this with a command line switch, > so that some random poor non-Windows OS is not patched incorrectly. I had the same concern, but there is no need to worry, we have sufficient checks that no other guest is affected. And we do have a switch as well, the APIC property. But this is better left on by default to please our guests with optimal performance. > >>> >>>> +static int get_kpcr_number(CPUState *env) >>>> +{ >>>> + struct kpcr { >>>> + uint8_t fill1[0x1c]; >>>> + uint32_t self; >>>> + uint8_t fill2[0x31]; >>>> + uint8_t number; >>>> + } QEMU_PACKED kpcr; >>> >>> KPCR. Pointers to Windows documentation would be nice. >> >> Oops, yes. >> >> Unfortunately, this is only an internal structure, not officially >> documented by MS. However, all supported OS versions a legacy by now, no >> longer changing its structure. > > This and a note about the supported OS versions could be added as comment. OK. For the folks that developed it in qemu-kvm: This targets Windows XP, Vista and Server 2003, all 32-bit, right? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html