> From: Nadav Har'El [mailto:nyh@xxxxxxxxxxxxxxxxxxx] > Sent: Wednesday, May 25, 2011 8:34 PM > > On Wed, May 25, 2011, Tian, Kevin wrote about "RE: [PATCH 23/31] nVMX: > Correct handling of interrupt injection": > > > static void enable_irq_window(struct kvm_vcpu *vcpu) > > > { > > > u32 cpu_based_vm_exec_control; > > > + if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) > > > + /* We can get here when nested_run_pending caused > > > + * vmx_interrupt_allowed() to return false. In this case, do > > > + * nothing - the interrupt will be injected later. > > > + */ > > > > I think this is not a rare path? when vcpu is in guest mode with L2 as current > > vmx context, this function could be invoked multiple times since kvm thread > > can be scheduled out/in randomly. > > As I wrote in this comment, this can only happen on nested_run_pending > (i.e., VMLAUNCH/VMRESUME emulation), because if !nested_run_pending, > and nested_exit_on_intr(), vmx_interrupt_allowed() would have already > exited L2, and we wouldn't be in this case. > > I don't know if to classify this as a "rare" path - it's definitely not > very common. But what does it matter if it's rare or common? It doesn't matter much. I just tried to understand your comment. > > > > > + if (to_vmx(vcpu)->nested.nested_run_pending) > > > + return 0; > > > > Well, now I can see why you require this special 'nested_run_pending' flag > > because there're places where L0 injects virtual interrupts right after > > VMLAUNCH/VMRESUME emulation and before entering L2. :-) > > Indeed. I tried to explain that in the patch description, where I wrote > > We keep a new flag, "nested_run_pending", which can override the decision > of > which should run next, L1 or L2. nested_run_pending=1 means that we > *must* run > L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 > and therefore expects L2 to be run (and perhaps be injected with an event it > specified, etc.). Nested_run_pending is especially intended to avoid switching > to L1 in the injection decision-point described above. atm when nested_run_pending is first introduced, its usage is simple which made me think this field may not be required. But later several key patches do depend on this flag for correctness. :-) > > > > + nested_vmx_vmexit(vcpu); > > > + vmcs12 = get_vmcs12(vcpu); > > > + vmcs12->vm_exit_reason = > EXIT_REASON_EXTERNAL_INTERRUPT; > > > + vmcs12->vm_exit_intr_info = 0; > > > + /* fall through to normal code, but now in L1, not L2 */ > > > + } > > > + > > > > This is a bad place to add this logic. vmx_interrupt_allowed is simply a > > query function but you make it an absolute trigger point for switching from > > L2 to L1. This is fine as now only point calling vmx_interrupt_allowed is > > when there's vNMI pending. But it's dangerous to have such assumption > > for pending events inside vmx_interrupt_allowed. > > Now you're beating a dead horse ;-) > > Gleb, and to some degree Avi, already argued that this is the wrong place > to do this exit, and if anything the exit should be done (or just decided on) > in enable_irq_window(). > > My counter-argument was that the right way is *neither* of these approaches > - > any attempt to "commandeer" one of the existing x86 ops, be they > vmx_interrupt_allowed() or enable_irq_window() to do in the L2 case things > they were never designed to do is both ugly, and dangerous if the call sites > change at some time in the future. > > So rather than changing one ugly abuse of one function, to the (arguably > also ugly) abuse of another function, what I'd like to see is a better overall > design, where the call sites in x86.c know about the possibility of a nested > guest (they already do - like we previously discussed, an is_guest_mode() > function was recently added), and when they need, *they* will call an > exit-to-L1 function, rather than calling a function called "enable_irq_window" > or "vmx_interrupt_allowed" which mysteriously will do the exit. > I agree with your point here. > > > On the other hand, I think there's one area which is not handled timely. > > I think you need to kick a L2->L1 transition when L0 wants to inject > > virtual interrupt. Consider your current logic: > > > > a) L2 is running on cpu1 > > b) L0 on cpu 0 decides to post a virtual interrupt to L1. An IPI is issued to > > cpu1 after updating virqchip > > c) L2 on cpu0 vmexit to L0, and checks whether L0 or L1 should handle > > the event. As it's an external interrupt, L0 will handle it. As it's a notification > > IPI, nothing is required. > > d) L0 on cpu0 then decides to resume, and find KVM_REQ_EVENT > > > > At this point you only add logic to enable_irq_window, but there's no > > action to trigger L2->L1 transition. So what will happen? Will the event > > be injected into L2 instead or pend until next switch happens due to > > other cause? > > I'm afraid I'm missing something in your explanation... In step d, L0 finds > an interrupt in the injection queue, so isn't the first thing it does is to > call vmx_interrupt_allowed(), to check if injection is allowed now? > In our code, "vmx_interrupt_allowed()" was bastardized to exit to L1 in > this case. Isn't that the missing exit you were looking for? > This is a false alarm. In my earlier search I thought that vmx_interrupt_allowed is only invoked in vmx.c for pending vNMI check which actually led me wonder for a bigger problem. But actually this .interrupt_allowed is checked in common path as expected. So my own problem here. :-) Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html