Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

Doug Covelli <doug.covelli@xxxxxxxxxxxx> · Wed, 13 Nov 2024 11:24:56 -0500

On Wed, Nov 13, 2024 at 2:31 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
>
>
> Il mar 12 nov 2024, 21:44 Doug Covelli <doug.covelli@xxxxxxxxxxxx> ha scritto:
>>
>> > Split irqchip should be the best tradeoff. Without it, moves from cr8
>> > stay in the kernel, but moves to cr8 always go to userspace with a
>> > KVM_EXIT_SET_TPR exit. You also won't be able to use Intel
>> > flexpriority (in-processor accelerated TPR) because KVM does not know
>> > which bits are set in IRR. So it will be *really* every move to cr8
>> > that goes to userspace.
>>
>> Sorry to hijack this thread but is there a technical reason not to allow CR8
>> based accesses to the TPR (not MMIO accesses) when the in-kernel local APIC is
>> not in use?
>
>
> No worries, you're not hijacking :) The only reason is that it would be more code for a seldom used feature and anyway with worse performance. (To be clear, CR8 based accesses are allowed, but stores cause an exit in order to check the new TPR against IRR. That's because KVM's API does not have an equivalent of the TPR threshold as you point out below).

I have not really looked at the code but it seems like it could also
simplify things as CR8 would be handled more uniformly regardless of
who is virtualizing the local APIC.

>> Also I could not find these documented anywhere but with MSFT's APIC our monitor
>> relies on extensions for trapping certain events such as INIT/SIPI plus LINT0
>> and SVR writes:
>>
>> UINT64 X64ApicInitSipiExitTrap    : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap
>> UINT64 X64ApicWriteLint0ExitTrap  : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>> UINT64 X64ApicWriteLint1ExitTrap  : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>> UINT64 X64ApicWriteSvrExitTrap    : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>
>
> There's no need for this in KVM's in-kernel APIC model. INIT and SIPI are handled in the hypervisor and you can get the current state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR and NMI blocking respectively, plus the interrupt shadow; so there's no need for userspace to know when LINT0/LINT1 themselves change. The spurious interrupt vector register is also handled completely in kernel.

I realize that KVM can handle LINT0/SVR updates themselves but our
interrupt subsystem relies on knowing the current values of these
registers even when not virtualizing the local APIC.  I suppose we
could use KVM_GET_LAPIC to sync things up on demand but that seems
like it might nor be great from a performance point of view.

>> I did not see any similar functionality for KVM.  Does anything like that exist?
>> In any case we would be happy to add support for handling CR8 accesses w/o
>> exiting w/o the in-kernel APIC along with some sort of a way to configure the
>> TPR threshold if folks are not opposed to that.
>
>
> As far I know everybody who's using KVM (whether proprietary or open source) has had no need for that, so I don't think it's a good idea to make the API more complex. Performance of Windows guests is going to be bad anyway with userspace APIC.