On 27/11/14 10:54, Anup Patel wrote: > On Thu, Nov 27, 2014 at 4:10 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >> On 27/11/14 10:22, Anup Patel wrote: >>> On Tue, Nov 25, 2014 at 7:12 PM, Christoffer Dall >>> <christoffer.dall@xxxxxxxxxx> wrote: >>>> On Tue, Nov 25, 2014 at 06:17:03PM +0530, Anup Patel wrote: >>>>> Hi Christoffer, >>>>> >>>>> On Mon, Nov 24, 2014 at 8:07 PM, Christoffer Dall >>>>> <christoffer.dall@xxxxxxxxxx> wrote: >>>>>> On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote: >>>>>>> On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall >>>>>>> <christoffer.dall@xxxxxxxxxx> wrote: >>>>>>>> On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote: >>>>>>>>> Hi Christoffer, >>>>>>>>> >>>>>>>>> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall >>>>>>>>> <christoffer.dall@xxxxxxxxxx> wrote: >>>>>>>>>> On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote: >>>>>>>>>>> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall >>>>>>>>>>> <christoffer.dall@xxxxxxxxxx> wrote: >>>>>>>>>>>> On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote: >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> I have second thoughts about rebasing KVM PMU patches >>>>>>>>>>>>> to Marc's irq-forwarding patches. >>>>>>>>>>>>> >>>>>>>>>>>>> The PMU IRQs (when virtualized by KVM) are not exactly >>>>>>>>>>>>> forwarded IRQs because they are shared between Host >>>>>>>>>>>>> and Guest. >>>>>>>>>>>>> >>>>>>>>>>>>> Scenario1 >>>>>>>>>>>>> ------------- >>>>>>>>>>>>> >>>>>>>>>>>>> We might have perf running on Host and no KVM guest >>>>>>>>>>>>> running. In this scenario, we wont get interrupts on Host >>>>>>>>>>>>> because the kvm_pmu_hyp_init() (similar to the function >>>>>>>>>>>>> kvm_timer_hyp_init() of Marc's IRQ-forwarding >>>>>>>>>>>>> implementation) has put all host PMU IRQs in forwarding >>>>>>>>>>>>> mode. >>>>>>>>>>>>> >>>>>>>>>>>>> The only way solve this problem is to not set forwarding >>>>>>>>>>>>> mode for PMU IRQs in kvm_pmu_hyp_init() and instead >>>>>>>>>>>>> have special routines to turn on and turn off the forwarding >>>>>>>>>>>>> mode of PMU IRQs. These routines will be called from >>>>>>>>>>>>> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ >>>>>>>>>>>>> forwarding state. >>>>>>>>>>>>> >>>>>>>>>>>>> Scenario2 >>>>>>>>>>>>> ------------- >>>>>>>>>>>>> >>>>>>>>>>>>> We might have perf running on Host and Guest simultaneously >>>>>>>>>>>>> which means it is quite likely that PMU HW trigger IRQ meant >>>>>>>>>>>>> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" >>>>>>>>>>>>> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine >>>>>>>>>>>>> of Marc's patchset which is called before local_irq_enable()). >>>>>>>>>>>>> >>>>>>>>>>>>> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu) >>>>>>>>>>>>> will accidentally forward IRQ meant for Host to Guest unless >>>>>>>>>>>>> we put additional checks to inspect VCPU PMU state. >>>>>>>>>>>>> >>>>>>>>>>>>> Am I missing any detail about IRQ forwarding for above >>>>>>>>>>>>> scenarios? >>>>>>>>>>>>> >>>>>>>>>>>> Hi Anup, >>>>>>>>>>> >>>>>>>>>>> Hi Christoffer, >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I briefly discussed this with Marc. What I don't understand is how it >>>>>>>>>>>> would be possible to get an interrupt for the host while running the >>>>>>>>>>>> guest? >>>>>>>>>>>> >>>>>>>>>>>> The rationale behind my question is that whenever you're running the >>>>>>>>>>>> guest, the PMU should be programmed exclusively with guest state, and >>>>>>>>>>>> since the PMU is per core, any interrupts should be for the guest, where >>>>>>>>>>>> it would always be pending. >>>>>>>>>>> >>>>>>>>>>> Yes, thats right PMU is programmed exclusively for guest when >>>>>>>>>>> guest is running and for host when host is running. >>>>>>>>>>> >>>>>>>>>>> Let us assume a situation (Scenario2 mentioned previously) >>>>>>>>>>> where both host and guest are using PMU. When the guest is >>>>>>>>>>> running we come back to host mode due to variety of reasons >>>>>>>>>>> (stage2 fault, guest IO, regular host interrupt, host interrupt >>>>>>>>>>> meant for guest, ....) which means we will return from the >>>>>>>>>>> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the >>>>>>>>>>> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled. >>>>>>>>>>> At this point we would have restored back host PMU context and >>>>>>>>>>> any PMU counter used by host can trigger PMU overflow interrup >>>>>>>>>>> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);" >>>>>>>>>>> in the kvm_arch_vcpu_ioctl_run() function (similar to the >>>>>>>>>>> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset) >>>>>>>>>>> which will try to detect PMU irq forwarding state in GIC hence it >>>>>>>>>>> can accidentally discover PMU irq pending for guest while this >>>>>>>>>>> PMU irq is actually meant for host. >>>>>>>>>>> >>>>>>>>>>> This above mentioned situation does not happen for timer >>>>>>>>>>> because virtual timer interrupts are exclusively used for guest. >>>>>>>>>>> The exclusive use of virtual timer interrupt for guest ensures that >>>>>>>>>>> the function kvm_timer_sync_hwstate() will always see correct >>>>>>>>>>> state of virtual timer IRQ from GIC. >>>>>>>>>>> >>>>>>>>>> I'm not quite following. >>>>>>>>>> >>>>>>>>>> When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section, >>>>>>>>>> you would (1) capture the active state of the IRQ pertaining to the >>>>>>>>>> guest and (2) deactive the IRQ on the host, then (3) switch the state of >>>>>>>>>> the PMU to the host state, and finally (4) re-enable IRQs on the CPU >>>>>>>>>> you're running on. >>>>>>>>>> >>>>>>>>>> If the host PMU state restored in (3) causes the PMU to raise an >>>>>>>>>> interrupt, you'll take an interrupt after (4), which is for the host, >>>>>>>>>> and you'll handle it on the host. >>>>>>>>>> >>>>>>>>> We only switch PMU state in assembly code using >>>>>>>>> kvm_call_hyp(__kvm_vcpu_run, vcpu) >>>>>>>>> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode) >>>>>>>>> the current hardware PMU state is for host. This means whenever >>>>>>>>> we are in host mode the host PMU can change state of PMU IRQ >>>>>>>>> in GIC even if local IRQs are disabled. >>>>>>>>> >>>>>>>>> Whenever we inspect active state of PMU IRQ in the >>>>>>>>> kvm_pmu_sync_hwstate() function using irq_get_fwd_state() API. >>>>>>>>> Here we are not guaranteed that IRQ forward state returned by the >>>>>>>>> irq_get_fwd_state() API is for guest only. >>>>>>>>> >>>>>>>>> The above situation does not manifest for virtual timer because >>>>>>>>> virtual timer registers are exclusively accessed by Guest and >>>>>>>>> virtual timer interrupt is only for Guest (never used by Host). >>>>>>>>> >>>>>>>>>> Whenever you schedule the guest VCPU again, you'll (a) disable >>>>>>>>>> interrupts on the CPU, (b) restore the active state of the IRQ for the >>>>>>>>>> guest, (c) restore the guest PMU state, (d) switch to the guest with >>>>>>>>>> IRQs enabled on the CPU (potentially). >>>>>>>>> >>>>>>>>> Here too, while we are between step (a) and step (b) the PMU HW >>>>>>>>> context is for host and any PMU counter can overflow. The step (b) >>>>>>>>> can actually override the PMU IRQ meant for Host. >>>>>>>>> >>>>>>>> Can you not simply switch the state from C-code after capturing the IRQ >>>>>>>> state then? Everything should be accessible from EL1, right? >>>>>>> >>>>>>> Yes, I think that would be the only option. This also means I will need >>>>>>> to re-implement context switching for doing it in C-code. >>>>>> >>>>>> Yes, you'd add some inline assembly in the C-code to access the >>>>>> registers I guess. Only thing I thought about after writing my original >>>>>> mail is whether you'll be counting events while context-swtiching and >>>>>> running on the host, which you actually don't want to. Not sure if >>>>>> there's a better way to avoid that. >>>>>> >>>>>>> >>>>>>> What about the scenario1 which I had mentioned? >>>>>>> >>>>>> >>>>>> You have to consider enabling/disabling forwarding and setting/clearing >>>>>> the active state is part of the guest PMU state and all of it has to be >>>>>> context-switched. >>>>> >>>>> I found one more issue. >>>>> >>>>> If PMU irq is PPI then enabling/disabling forwarding will not >>>>> work because irqd_set_irq_forwarded() function takes irq_data >>>>> as argument which is member of irq_desc and irq_desc for PPIs >>>>> is not per_cpu. This means we cannot call irqd_set_irq_forwarded() >>>>> simultaneously from different host CPUs. >>>>> >>> >>> Hi Marc, >>> >>>> I'll let Marc answer this one and if this still applies to his view of >>>> how the next version of the forwarding series will look like. >> >> I'm looking at it at the moment. >> >> I'm inclined to say that we should fix the forwarding code to allow >> individual PPIs to be forwarded. This is a bit harder than what we're >> doing at the moment, but that's possible. >> >> Of course, that complicates the code a bit, as we have to make sure >> we're not premptable at that time. >> >> What do you think? > > Currently, irqd_set_irq_forwarded() is lockless. > > It would be great if we can update irqd_set_irq_forwarded() for PPIs > such that it remains irqd_set_irq_forwarded() lockless so that we > dont have much overhead when we enable/disable forwarding > state. We probably need a separate API anyway, as you want to be able to provide a cpumask to configure this. We can refine this as we go, and I wouldn't worry about overhead just yet. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html