RE: event injection MACROs

"Dong, Eddie" <eddie.dong@xxxxxxxxx> · Thu, 14 May 2009 22:34:11 +0800

Gleb Natapov wrote:
> On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
>> Avi Kivity wrote:
>>> Dong, Eddie wrote:
>>>> OK.
>>>> Also back to Gleb's question, the reason I want to do that is to
>>>> simplify event generation mechanism in current KVM.
>>>> 
>>>> Today KVM use additional layer of exception/nmi/interrupt such as
>>>> vcpu.arch.exception.pending, vcpu->arch.interrupt.pending &
>>>> vcpu->arch.nmi_injected. All those additional layer is due to
>>>> compete of VM_ENTRY_INTR_INFO_FIELD
>>>> write to inject the event. Both SVM & VMX has only one resource to
>>>> inject the virtual event but KVM generates 3 catagory of events in
>>>> parallel which further requires additional
>>>> logic to dictate among them.
>>> 
>>> I thought of using a queue to hold all pending events (in a common
>>> format), sort it by priority, and inject the head.
>> 
>> The SDM Table 5-4 requires to merge 2 events together, i.e. convert
>> to #DF/ 
>> Triple fault or inject serially when 2 events happens no matter NMI,
>> IRQ or exception. 
>> 
>> As if considering above events merging activity, that is a single
>> element queue. 
> I don't know how you got to this conclusion from you previous
> statement. 
> See explanation to table 5-2 for instate where it is stated that
> interrupt should be held pending if there is exception with higher
> priority. Should be held pending where? In the queue, like we do. Note
> that low prio exceptions are just dropped since they will be
> regenerated. 

I have different understanding here.
My understanding is that "held" means NO INTA in HW, i.e. LAPIC still hold this IRQ.

> 
>>  We could have either:  1) A pure SW "queue" that will be flush to HW
>> register later (VM_ENTRY_INTR_INFO_FIELD), 2) Direct use HW register.
>> 
> We have three event sources 1) exceptions 2) IRQ 3) NMI. We should
> have 
> queue of three elements sorted by priority. On each entry we should

Table 5-4 alreadys says NMI/IRQ is BENIGN.

> inject an event with highest priority. And remove it from queue on
> exit. 

The problem is that we have to decide to inject only one of above 3, and discard the rest.
Whether priority them or merge (to one event as Table 5-4) is another story.

> 
>> 
>> A potential benefit is that it can avoid duplicated code and
>> potential bugs 
>> in current code as following patch shows if I understand correctly:
>> 
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2599,7 +2599,7 @@ static int handle_exception(struct kvm_vcpu
>>                 *vcpu, struct kvm_run *kvm_run) cr2 =
>>                 vmcs_readl(EXIT_QUALIFICATION);
>>                             KVMTRACE_3D(PAGE_FAULT, vcpu,
>> error_code, (u32)cr2, (u32)((u64)cr2 >> 32), handler); -            
>> if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending ) +
>>                         if (vcpu->arch.interrupt.pending ||
>>                 vcpu->arch.exception.pending  ||
>>         vcpu->arch.nmi_injected) kvm_mmu_unprotect_page_virt(vcpu,
>> cr2); return kvm_mmu_page_fault(vcpu, cr2, error_code); } 
> This fix is already in Avi's tree (not yet pushed).
> 
>> Either way are OK and up to you. BTW Xen uses HW register directly
>> to representing 
>> an pending event.
>> 
> In this particular case I don't mind to use HW register either, but I
> don't see any advantage.
> 
>>> 
>>>> One example is that exception has higher priority
>>>> than NMI/IRQ injection in current code which is not true in
>>>> reality. 
>>>> 
>>> 
>>> I don't think it matters in practice, since the guest will see it
>>> as a timing issue.  NMIs and IRQs are asynchronous (even those
>>> generated by the guest through the local APIC).
>> 
>> Yes. But also cause IRQ injection be delayed which may have side
>> effect. 
>> For example if guest exception handler is very longer or if guest
>> VCPU fall into recursive #GP. Within current logic, a guest IRQ
>> event from KDB (IPI) running 
>> on VCPU0, as an example, can't force the dead loop VCPU1 into KDB
>> since it 
>> is recursively #GP.
> If one #GP causes another #GP this is a #DF. If CPU has a chance to

Means another #GP in next instruction i.e. Beginning of #GP handler in guest.
No #DF here.

> executes 
> something in between KVM will have a chance to inject NMI.

Could have no chance in some cases though not very common.

> 
>> 
>>> 
>>>> Another issue is that an failed event from previous injection say
>>>> IRQ or NMI may be discarded if an virtual exception happens in the
>>>> EXIT handling now. With the patch of generic double fault handling,
>>>> this case should be handled as normally.
>>>> 
>>> 
>>> Discarding an exception is usually okay as it will be regenerated. 
>>> I don't think we discard interrupts or NMIs.
>> In reality (Running OS in guest), it doesn't happen so far. But
>> architecturally, 
>> it could. For example KVM injects an IRQ, but VM Resume get #PF and
>> back to KVM with IDT_VECTORING valid. Then KVM will put back the
>> failed 
>> IRQ to interrupt queue. But if #PF handling generates another
>> exception, 
>> then the interrupt queue won't be able to be injected, since KVM
>> inject 
>> exception first. And the interrupt queue is discarded at next VM
>> Exit. 
>> 
> I acknowledge the presence of the bug although I was not able to
> write a test case 
> to cause it yet, but it is easy to fix this without changing code too
> much. Unified event queue and clearing of only injected event on exit
> should do the trick. 

Yes, minor.

Eddie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html