On Mon, Feb 16, 2015 at 05:46:54PM +0530, Anup Patel wrote: > Hi Christoffer, > > On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall > <christoffer.dall@xxxxxxxxxx> wrote: > > Hi Anup, > > > > On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote: > >> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall > >> <christoffer.dall@xxxxxxxxxx> wrote: > >> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote: > >> >> (dropping previous conversation for easy reading) > >> >> > >> >> Hi Marc/Christoffer, > >> >> > >> >> I tried implementing PMU context-switch via C code > >> >> in EL1 mode and in atomic context with irqs disabled. > >> >> The context switch itself works perfectly fine but > >> >> irq forwarding is not clean for PMU irq. > >> >> > >> >> I found another issue that is GIC only samples irq > >> >> lines if they are enabled. This means for using > >> >> irq forwarding we will need to ensure that host PMU > >> >> irq is enabled. The arch_timer code does this by > >> >> doing request_irq() for host virtual timer interrupt. > >> >> For PMU, we can either enable/disable host PMU > >> >> irq in context switch or we need to do have shared > >> >> irq handler between kvm pmu and host kernel pmu. > >> > > >> > could we simply require the host PMU driver to request the IRQ and have > >> > the driver inject the corresponding IRQ to the VM via a mechanism > >> > similar to VFIO using an eventfd and irqfds etc.? > >> > >> Currently, the host PMU driver does request_irq() only when > >> there is some event to be monitored. This means host will do > >> request_irq() only when we run perf application on host > >> user space. > >> > >> Initially, I though that we could simply pass IRQF_SHARED > >> for request_irq() in host PMU driver and do the same for > >> reqest_irq() in KVM PMU code but the PMU irq can be > >> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED > >> flag would fine but if its PPI then we have no way to > >> set IRQF_SHARED flag because request_percpu_irq() > >> does not have irq flags parameter. > >> > >> > > >> > (I haven't quite thought through if there's a way for the host PMU > >> > driver to distinguish between an IRQ for itself and one for the guest, > >> > though). > >> > > >> > It does feel like we will need some sort of communication/coordination > >> > between the host PMU driver and KVM... > >> > > >> >> > >> >> I have rethinked about our discussion so far. I > >> >> understand that we need KVM PMU virtualization > >> >> to meet following criteria: > >> >> 1. No modification in host PMU driver > >> > > >> > is this really a strict requirement? one of the advantages of KVM > >> > should be that the rest of the kernel should be supportive of KVM. > >> > >> I guess so because host PMU driver should not do things > >> differently for host and guest. I think this the reason why > >> we discarded the mask/unmask PMU irq approach which > >> I had implemented in RFC v1. > >> > >> > > >> >> 2. No modification in guest PMU driver > >> >> 3. No mask/unmask dance for sharing host PMU irq > >> >> 4. Clean way to avoid infinite VM exits due to > >> >> PMU interrupt > >> >> > >> >> I have discovered new approach which is as follows: > >> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable()) > >> >> 2. Ensure that host PMU irq is disabled when entering guest > >> >> mode and re-enable host PMU irq when exiting guest mode if > >> >> it was enabled previously. > >> > > >> > How does this look like software-engineering wise? Would you be looking > >> > up the IRQ number from the DT in the KVM code again? How does KVM then > >> > synchronize with the host PMU driver so they're not both requesting the > >> > same IRQ at the same time? > >> > >> We only lookup host PMU irq numbers from DT at HYP init time. > >> > >> During context switch we know the host PMU irq number for > >> current host CPU so we can get state of host PMU irq in > >> context switch code. > >> > >> If we go by the shard irq handler approach then both KVM > >> and host PMU driver will do request_irq() on same host > >> PMU irq. In other words, there is no virtual PMU irq provided > >> by HW for guest. > >> > > > > Sorry for the *really* long delay in this response. > > > > We had a chat about this subject with Will Deacon and Marc Zyngier > > during connect, and basically we came to think of a number of problems > > with the current approach: > > > > 1. As you pointed out, there is a need for a shared IRQ handler, and > > there is no immediately nice way to implement this without a more > > sophisticated perf/kvm interface, probably comprising eventfds or > > something similar. > > > > 2. Hijacking the counters for the VM without perf knowing about it > > basically makes it impossible to do system-wide event counting, an > > important use case for a virtualization host. > > > > So the approach we will be taking now would be to: > > > > First, implement a strictly trap-and-emulate in software approach. This > > would allow any software relying on access to performance counters to > > work, although potentially with slightly unprecise values. This is the > > approach taken by x86 and would be significantly simpler to support on > > systems like big.LITTLE as well. > > Actually, trap-and-emulate would also help avoid additions to the > KVM world switch. > > > > > Second, if there are values obtained from within the guest that are so > > skewed by the trap-and-emulate approach that we need to give the guest > > access to counters, we should try to share the hardware by partitioning > > the physical counters, but again, we need to coordinate with the host > > perf system for this. We would only be pursuing this approach if > > absolutely necessary. > > Yes, with trap-and-emulate we cannot accurately emulate all types > of hw counters (particularly cache misses and similar events). > > > > > Apologies for the change in direction on this. > > > > What are your thoughts? Do you still have time/interest to work > > on any of this? > > Its a drastic change in direction. > > Currently, I have taken up some different work (not related to KVM) > so for next few months I wont be able spend time on this. > > Its better if Linaro takes this work to avoid any further delays. > ok, will do, thanks for being responsive and putting in the efforts so far! Best, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html