Re: [PATCH v2 3/4] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 20, 2018 at 02:49:05PM +0000, Suzuki K Poulose wrote:
> On 11/20/2018 02:15 PM, Andrew Murray wrote:
> > Add support for the :G and :H attributes in perf by handling the
> > exclude_host/exclude_guest event attributes.
> > 
> > We notify KVM of counters that we wish to be enabled or disabled on
> > guest entry/exit and thus defer from starting or stopping :G events
> > as per the events exclude_host attribute.
> > 
> > When using VHE, EL2 is unused by the guest - therefore we can filter
> > out these events with the PMU as per the 'exclude_host' attribute.
> > 
> > With both VHE and non-VHE we switch the counters between host/guest
> > at EL2. With non-VHE when using 'exclude_host' we filter out EL2.
> > 
> > These changes eliminate counters counting host events on the
> > boundaries of guest entry/exit when using :G. However when using :H
> > unless exclude_hv is set on non-VHE then there is a small blackout
> > window at the guest entry/exit where host events are not captured.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@xxxxxxx>
> > ---
> >   arch/arm64/kernel/perf_event.c | 41 +++++++++++++++++++++++++++++++++++------
> >   1 file changed, 35 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > index de564ae..412bd80 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -26,6 +26,7 @@
> >   #include <linux/acpi.h>
> >   #include <linux/clocksource.h>
> > +#include <linux/kvm_host.h>
> >   #include <linux/of.h>
> >   #include <linux/perf/arm_pmu.h>
> >   #include <linux/platform_device.h>
> > @@ -647,11 +648,23 @@ static inline int armv8pmu_enable_counter(int idx)
> >   static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> >   {
> > +	struct perf_event_attr *attr = &event->attr;
> >   	int idx = event->hw.idx;
> > +	u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > -	armv8pmu_enable_counter(idx);
> >   	if (armv8pmu_event_is_chained(event))
> > -		armv8pmu_enable_counter(idx - 1);
> > +		counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> 
> minor nit: If you rearrange the code below a bit
> 
> > +
> > +	if (attr->exclude_host)
> > +		kvm_set_clr_guest_pmu_events(0, counter_bits);
> > +	if (attr->exclude_guest)
> > +		kvm_set_clr_host_pmu_events(0, counter_bits);
> > +
> > +	if (!attr->exclude_host) {
> > +		armv8pmu_enable_counter(idx);
> > +		if (armv8pmu_event_is_chained(event))
> > +			armv8pmu_enable_counter(idx - 1);
> 
> we could have :
> 
> 	if (attr->exclude_guest)
> 		kvm_set_clr_host_pmu_events(0, counter_bits);
> 
> 	if (attr->exclude_host) {
> 		kvm_set_clr_guest_pmu_events(0, counter_bits);
> 		return;
> 	}
> 
> 	armv8pmu_enable_counter(idx);
> 	if (armv8pmu_event_is_chained(event))
> 		armv8pmu_enable_counter(idx - 1);
> 
> Similarly for disable_event_counter.

I'm not really sure this is more readable. It makes the symmetric
clr_{guest|host} calls asymmetric. And moves the condition for
enabling counters away from the code that conditionally enables
it. Though it does get rid of one if statement...

> 	
> > +	}
> >   }
> >   static inline int armv8pmu_disable_counter(int idx)
> > @@ -664,11 +677,23 @@ static inline int armv8pmu_disable_counter(int idx)
> >   static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> >   {
> >   	struct hw_perf_event *hwc = &event->hw;
> > +	struct perf_event_attr *attr = &event->attr;
> >   	int idx = hwc->idx;
> > +	u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> >   	if (armv8pmu_event_is_chained(event))
> > -		armv8pmu_disable_counter(idx - 1);
> > -	armv8pmu_disable_counter(idx);
> > +		counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > +
> > +	if (attr->exclude_host)
> > +		kvm_set_clr_guest_pmu_events(counter_bits, 0);
> > +	if (attr->exclude_guest)
> > +		kvm_set_clr_host_pmu_events(counter_bits, 0);
> > +
> > +	if (!attr->exclude_host) {
> > +		if (armv8pmu_event_is_chained(event))
> > +			armv8pmu_disable_counter(idx - 1);
> > +		armv8pmu_disable_counter(idx);
> > +	}
> >   }
> >   static inline int armv8pmu_enable_intens(int idx)
> > @@ -945,12 +970,12 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> >   	 * with other architectures (x86 and Power).
> >   	 */
> >   	if (is_kernel_in_hyp_mode()) {
> > -		if (!attr->exclude_kernel)
> > +		if (!attr->exclude_kernel && !attr->exclude_host)
> >   			config_base |= ARMV8_PMU_INCLUDE_EL2;
> 
> Shouldn't we handle "exclude_kernel" for a "Guest" event ?
> i.e, what if we have exclude_kernel + exclude_host set ? Doesn't
> the "exclude_kernel" apply to the event filtering after we enter
> guest and thus, we need to set the EXCLUDE_EL1 ?

Yes you're right. This is a problem not just for a VHE host's guest but
for the host as well - for example perf -e instructions:h and
-e instructions:G will currently give the same count.

> 
> Also I am wondering what is the situation with Nested virtualisation
> coming in. i.e, if the host hyp wanted to profile the guest hyp, should
> we set EL2 events ? I understand this is something which should be
> solved with the nested virt changes. But it would be good to see
> if we could filter "exclude_host" and "exclude_guest" at the hypervisor
> level (i.e, in software, without PMU filtering) to allow the normal
> controls to make use of the hardware filtering ?

It took me a while to think this through and I think you're right in
that we can do more to future proof this for nested virt. In fact we
don't depend on the hunks in this function (i.e. addition of extra
'!attr->exclude_host' conditions) - we don't need them due to the
enable/disable of counters at guest entry/exit.

With the assumption of the hypervisor switch enabling/disabling
host/guest counters I think we can now do the following:

        /*
         * If we're running in hyp mode, then we *are* the hypervisor.
         * Therefore we ignore exclude_hv in this configuration, since
         * there's no hypervisor to sample anyway. This is consistent
         * with other architectures (x86 and Power).
         */
        if (is_kernel_in_hyp_mode())
                if (!attr->exclude_kernel)
                        config_base |= ARMV8_PMU_INCLUDE_EL2;
        else
                if (!attr->exclude_hv)
                        config_base |= ARMV8_PMU_INCLUDE_EL2;

        if (attr->exclude_kernel)
                config_base |= ARMV8_PMU_EXCLUDE_EL1;

        if (attr->exclude_user)
                config_base |= ARMV8_PMU_EXCLUDE_EL0;

Also for nested virt we'd need to update
kvm_pmu_set_counter_event_type to ensure exclude_kernel is set when
the guest attempts to include EL2 to count a VHE guest kernel.

Does this look right?

Thanks,

Andrew Murray

> 
> >   	} else {
> >   		if (attr->exclude_kernel)
> >   			config_base |= ARMV8_PMU_EXCLUDE_EL1;
> > -		if (!attr->exclude_hv)
> > +		if (!attr->exclude_hv && !attr->exclude_host)
> >   			config_base |= ARMV8_PMU_INCLUDE_EL2;
> >   	}
> 
> 
> >   	if (attr->exclude_user)
> > @@ -976,6 +1001,10 @@ static void armv8pmu_reset(void *info)
> >   		armv8pmu_disable_intens(idx);
> >   	}
> > +	/* Clear the counters we flip at guest entry/exit */
> > +	kvm_set_clr_host_pmu_events(U32_MAX, 0);
> > +	kvm_set_clr_guest_pmu_events(U32_MAX, 0);
> > +
> >   	/*
> >   	 * Initialize & Reset PMNC. Request overflow interrupt for
> >   	 * 64 bit cycle counter but cheat in armv8pmu_write_counter().
> > 
> 
> Suzuki
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux