Re: [PATCH 3/6] kvmvapic: Introduce TPR access optimization for Windows guests

Avi Kivity <avi@xxxxxxxxxx> · Thu, 09 Feb 2012 18:00:45 +0200

On 02/09/2012 05:39 PM, Jan Kiszka wrote:
> On 2012-02-09 16:18, Avi Kivity wrote:
> > On 02/05/2012 02:39 PM, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
> >>
> >> This enables acceleration for MMIO-based TPR registers accesses of
> >> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
> >> either on older Intel CPUs (without flexpriority feature, can also be
> >> manually disabled for testing) or any current AMD processor.
> >>
> >> The approach introduced here is derived from the original version of
> >> qemu-kvm. It was refactored, documented, and extended by support for
> >> user space APIC emulation, both with and without KVM acceleration. 
> > 
> > However, it's presented as a rewrite instead of a series of changes, so
> > we can't see what the changes are.
>
> Yes, but the original code also depends on interfaces we don't have in
> upstream.

The usual rant: patch qemu-kvm until it's suitable for upsteam, then
submit the end result for upstream.  But you don't deserve this today.

> >> +
> >> +    if (access == TPR_ACCESS_WRITE && kvm_enabled() &&
> >> +        !kvm_irqchip_in_kernel()) {
> >> +        /*
> >> +         * KVM without TPR access reporting calls into the user space APIC on
> >> +         * write with IP pointing after the accessing instruction. So we need
> >> +         * to look backward to find the reason.
> >> +         */
> > 
> > Why do we need to do anything at all?
>
> We need to patch the causing instruction, so we have to know where it
> starts. Or what do you mean?

Just don't deal with this at all, no one runs on kernels without kernel
irqchip.

> > 
> > I'm not sure if the ABI guarantees anything about %rip.
>
> That's indeed a point. It's likely coupled to the emulator's internals
> and when it calls out to user space for MMIO write. How to deal with it?

One way is to verify that it worked this way at least N versions back,
and then retro-doc it.  The downside is that it reduces our flexibility
in the future, but I think that's a small downside.

Another way is not to do it at all.

> >> +static int get_kpcr_number(CPUState *env)
> >> +{
> >> +    struct kpcr {
> >> +        uint8_t  fill1[0x1c];
> >> +        uint32_t self;
> >> +        uint8_t  fill2[0x31];
> >> +        uint8_t  number;
> >> +    } QEMU_PACKED kpcr;
> >> +
> >> +    if (smp_cpus == 1) {
> >> +        return 0;
> >> +    }
> >> +    if (cpu_memory_rw_debug(env, env->segs[R_FS].base,
> >> +                            (void *)&kpcr, sizeof(kpcr), 0) < 0 ||
> >> +        kpcr.self != env->segs[R_FS].base) {
> > 
> > Ah, so it works.  We may want to do it for UP as well, as a way of
> > verifying that the guest is compatible with these hacks.
>
> I'm not sure if Windows has this properly set up for the UP HAL. I
> rather think this was a bug in the original implementation. The ROM uses
> 0 as CPU index in UP mode unconditionally, so should we in QEMU.

I mean just check kpcr.self.

The reason up and smp are so different is that for a long while smp
didn't work at all, and for some time it used even uglier hacks than we
have today (like stashing the cpu id in TR.sel), so I didn't want to
pollute th up code with the smp ugliness.  It's also marginally faster
(less locked ops), though I doubt it matters on today's processors.

> > 
> >> +
> >> +    memory_region_init_alias(&s->rom, "kvmvapic-rom", section.mr, rom_paddr,
> >> +                             rom_size);
> >> +    memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 1000);
> > 
> > This is incredibly hacky, so at least the spirit of the original code is
> > preserved.
>
> I know, and it caused some pain to write it (not only to find out how to
> solve it technically). We would need to pass the RAM memory region down
> to this freaky device, like we do to the i440fx for PAM purposes. But,
> well, that is not straightforward right now. Better ideas welcome.

Could we make it a child<> of i440FX, and have i440FX pass it the
MemoryRegion directly?

It means we'll need to redo the glue for new chipsets, but it should be
just a few lines.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html