Re: [PATCH 3/6] kvmvapic: Introduce TPR access optimization for Windows guests

Jan Kiszka <jan.kiszka@xxxxxx> · Thu, 09 Feb 2012 17:32:12 +0100

On 2012-02-09 17:00, Avi Kivity wrote:
> On 02/09/2012 05:39 PM, Jan Kiszka wrote:
>> On 2012-02-09 16:18, Avi Kivity wrote:
>>> On 02/05/2012 02:39 PM, Jan Kiszka wrote:
>>>> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
>>>>
>>>> This enables acceleration for MMIO-based TPR registers accesses of
>>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>>> either on older Intel CPUs (without flexpriority feature, can also be
>>>> manually disabled for testing) or any current AMD processor.
>>>>
>>>> The approach introduced here is derived from the original version of
>>>> qemu-kvm. It was refactored, documented, and extended by support for
>>>> user space APIC emulation, both with and without KVM acceleration. 
>>>
>>> However, it's presented as a rewrite instead of a series of changes, so
>>> we can't see what the changes are.
>>
>> Yes, but the original code also depends on interfaces we don't have in
>> upstream.
> 
> The usual rant: patch qemu-kvm until it's suitable for upsteam, then
> submit the end result for upstream.  But you don't deserve this today.
> 
>>>> +
>>>> +    if (access == TPR_ACCESS_WRITE && kvm_enabled() &&
>>>> +        !kvm_irqchip_in_kernel()) {
>>>> +        /*
>>>> +         * KVM without TPR access reporting calls into the user space APIC on
>>>> +         * write with IP pointing after the accessing instruction. So we need
>>>> +         * to look backward to find the reason.
>>>> +         */
>>>
>>> Why do we need to do anything at all?
>>
>> We need to patch the causing instruction, so we have to know where it
>> starts. Or what do you mean?
> 
> Just don't deal with this at all, no one runs on kernels without kernel
> irqchip.

Not true for upstream, and not a design goal of this approach,
specifically when considering that it also works with TCG. Would be a
pity to lose this generality.

> 
>>>
>>> I'm not sure if the ABI guarantees anything about %rip.
>>
>> That's indeed a point. It's likely coupled to the emulator's internals
>> and when it calls out to user space for MMIO write. How to deal with it?
> 
> One way is to verify that it worked this way at least N versions back,
> and then retro-doc it.  The downside is that it reduces our flexibility
> in the future, but I think that's a small downside.

It need not reduce our flexibility, we just need to signal to user space
when our behaviour changes again.

> 
> Another way is not to do it at all.
> 
>>>> +static int get_kpcr_number(CPUState *env)
>>>> +{
>>>> +    struct kpcr {
>>>> +        uint8_t  fill1[0x1c];
>>>> +        uint32_t self;
>>>> +        uint8_t  fill2[0x31];
>>>> +        uint8_t  number;
>>>> +    } QEMU_PACKED kpcr;
>>>> +
>>>> +    if (smp_cpus == 1) {
>>>> +        return 0;
>>>> +    }
>>>> +    if (cpu_memory_rw_debug(env, env->segs[R_FS].base,
>>>> +                            (void *)&kpcr, sizeof(kpcr), 0) < 0 ||
>>>> +        kpcr.self != env->segs[R_FS].base) {
>>>
>>> Ah, so it works.  We may want to do it for UP as well, as a way of
>>> verifying that the guest is compatible with these hacks.
>>
>> I'm not sure if Windows has this properly set up for the UP HAL. I
>> rather think this was a bug in the original implementation. The ROM uses
>> 0 as CPU index in UP mode unconditionally, so should we in QEMU.
> 
> I mean just check kpcr.self.

Yes, clear, but that means that Windows must have initialized FS.base to
point to the KPCR also in UP mode. Is that really the case? E.g. when
ACPI is off?! I wonder if that explains the reported bug of qemu-kvm
with -no-acpi and in-kernel irqchip...

> 
> The reason up and smp are so different is that for a long while smp
> didn't work at all, and for some time it used even uglier hacks than we
> have today (like stashing the cpu id in TR.sel), so I didn't want to
> pollute th up code with the smp ugliness.  It's also marginally faster
> (less locked ops), though I doubt it matters on today's processors.
> 
>>>
>>>> +
>>>> +    memory_region_init_alias(&s->rom, "kvmvapic-rom", section.mr, rom_paddr,
>>>> +                             rom_size);
>>>> +    memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 1000);
>>>
>>> This is incredibly hacky, so at least the spirit of the original code is
>>> preserved.
>>
>> I know, and it caused some pain to write it (not only to find out how to
>> solve it technically). We would need to pass the RAM memory region down
>> to this freaky device, like we do to the i440fx for PAM purposes. But,
>> well, that is not straightforward right now. Better ideas welcome.
> 
> Could we make it a child<> of i440FX, and have i440FX pass it the
> MemoryRegion directly?
> 
> It means we'll need to redo the glue for new chipsets, but it should be
> just a few lines.

Hmm... not really nice. It is rather a child of the APIC than of the
chipset IMHO. Moving it over would also mean establishing logical link
to the APIC from there. It is really an ugly beast...

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature