On 2015/10/26 19:33, Christoffer Dall wrote: > On Thu, Sep 24, 2015 at 03:31:05PM -0700, Shannon Zhao wrote: >> This patchset adds guest PMU support for KVM on ARM64. It takes >> trap-and-emulate approach. When guest wants to monitor one event, it >> will be trapped by KVM and KVM will call perf_event API to create a perf >> event and call relevant perf_event APIs to get the count value of event. >> >> Use perf to test this patchset in guest. When using "perf list", it >> shows the list of the hardware events and hardware cache events perf >> supports. Then use "perf stat -e EVENT" to monitor some event. For >> example, use "perf stat -e cycles" to count cpu cycles and >> "perf stat -e cache-misses" to count cache misses. >> >> Below are the outputs of "perf stat -r 5 sleep 5" when running in host >> and guest. >> >> Host: >> Performance counter stats for 'sleep 5' (5 runs): >> >> 0.551428 task-clock (msec) # 0.000 CPUs utilized ( +- 0.91% ) >> 1 context-switches # 0.002 M/sec >> 0 cpu-migrations # 0.000 K/sec >> 48 page-faults # 0.088 M/sec ( +- 1.05% ) >> 1150265 cycles # 2.086 GHz ( +- 0.92% ) >> <not supported> stalled-cycles-frontend >> <not supported> stalled-cycles-backend >> 526398 instructions # 0.46 insns per cycle ( +- 0.89% ) >> <not supported> branches >> 9485 branch-misses # 17.201 M/sec ( +- 2.35% ) >> >> 5.000831616 seconds time elapsed ( +- 0.00% ) >> >> Guest: >> Performance counter stats for 'sleep 5' (5 runs): >> >> 0.730868 task-clock (msec) # 0.000 CPUs utilized ( +- 1.13% ) >> 1 context-switches # 0.001 M/sec >> 0 cpu-migrations # 0.000 K/sec >> 48 page-faults # 0.065 M/sec ( +- 0.42% ) >> 1642982 cycles # 2.248 GHz ( +- 1.04% ) >> <not supported> stalled-cycles-frontend >> <not supported> stalled-cycles-backend >> 637964 instructions # 0.39 insns per cycle ( +- 0.65% ) >> <not supported> branches >> 10377 branch-misses # 14.198 M/sec ( +- 1.09% ) >> >> 5.001289068 seconds time elapsed ( +- 0.00% ) > > This looks pretty cool! > > I'll review your next patch set version in more detail. > > Have you tried runnig a no-op cycle counter read test in the guest and > in the host? > > Basically something like: > > static void nop(void *junk) > { > } > > static void test_nop(void) > { > unsigned long before,after; > before = read_cycles(); > isb(); > nop(NULL); > isb(); > after = read_cycles(); > } > > I would be very curious to see if we get a ~6000 cycles overhead in the > guest compared to bare-metal, which I expect. > Ok, I'll try this while I'm doing more tests on v4. > If we do, we should consider a hot-path in the the EL2 assembly code to > read the cycle counter to reduce the overhead to something more precise. > -- Shannon -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html