RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@xxxxxxx]
> Sent: 02 October 2018 15:11
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
> lorenzo.pieralisi@xxxxxxx; robin.murphy@xxxxxxx
> Cc: mark.rutland@xxxxxxx; vkilari@xxxxxxxxxxxxxx;
> neil.m.leeder@xxxxxxxxx; pabba@xxxxxxxxxxxxxx; John Garry
> <john.garry@xxxxxxxxxx>; will.deacon@xxxxxxx; rruigrok@xxxxxxxxxxxxxx;
> Linuxarm <linuxarm@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux-
> acpi@xxxxxxxxxxxxxxx; Guohanjun (Hanjun Guo) <guohanjun@xxxxxxxxxx>;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> Hi Shameer,
> 
> I have a few comments below, mostly naive since I don't know anything
> about perf drivers.

Thanks for taking a look at this.

> On 21/09/2018 16:08, Shameer Kolothum wrote:
> > From: Neil Leeder <nleeder@xxxxxxxxxxxxxx>
> >
> > Adds a new driver to support the SMMUv3 PMU and add it into the
> > perf events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as smmuv3_pmcg_<phys_addr_page>
> where
> > <phys_addr_page> is the physical page address of the SMMU PMCG.
> > For example, the PMCG at 0xff88840000 is named smmuv3_pmcg_ff88840
> >
> > Filtering by stream id is done by specifying filtering parameters
> > with the event. options are:
> >    filter_enable    - 0 = no filtering, 1 = filtering enabled
> >    filter_span      - 0 = exact match, 1 = pattern match
> >    filter_stream_id - pattern to filter against
> > Further filtering information is available in the SMMU documentation.
> >
> > Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> >                        filter_span=1,filter_stream_id=0x42/ -a pwd
> > Applies filter pattern 0x42 to transaction events.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder <nleeder@xxxxxxxxxxxxxx>
> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@xxxxxxxxxx>
> > ---
> >  drivers/perf/Kconfig          |   9 +
> >  drivers/perf/Makefile         |   1 +
> >  drivers/perf/arm_smmuv3_pmu.c | 736
> ++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 746 insertions(+)
> >  create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 08ebaf7..34969dd 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> >  	depends on ARM_PMU && ACPI
> >  	def_bool y
> >
> > +config ARM_SMMU_V3_PMU
> > +	 bool "ARM SMMUv3 Performance Monitors {Extension}"
> 
> Why the curly braces? I didn't find that notation in other Kconfig files

Hmm..That's probably because I just copied a suggestion from previous
review. I will double check and correct it.

> > +	 depends on ARM64 && ACPI && ARM_SMMU_V3
> > +	   help
> > +	   Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > +	   on ARM-based systems.
> > +	   Adds the SMMU PMU into the perf events subsystem for
> > +	   monitoring SMMU performance events.
> > +
> >  config ARM_DSU_PMU
> >  	tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> >  	depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b3902bd..f10a932 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> [...]
> > +/*
> > + * This driver adds support for perf events to use the Performance
> > + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node
> > + * to monitor that node.
> > + *
> > + * SMMUv3 PMCG devices are named as
> smmuv3_pmcg_<phys_addr_page> where
> > + * <phys_addr_page> is the physical page address of the SMMU PMCG.
> > + * For example, the PMCG at 0xff88840000 is named
> smmuv3_pmcg_ff88840
> > +
> > + * Filtering by stream id is done by specifying filtering parameters
> > + * with the event. options are:
> > + *   filter_enable    - 0 = no filtering, 1 = filtering enabled
> > + *   filter_span      - 0 = exact match, 1 = pattern match
> > + *   filter_stream_id - pattern to filter against
> > + * Further filtering information is available in the SMMU documentation.
> > + *
> > + * Example: perf stat -e
> smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> > + *                       filter_span=1,filter_stream_id=0x42/ -a pwd
> 
> I'm curious, why is pwd used as example? Wouldn't something like netperf
> be a more realistic workload?

Agree. That’s a more relevant workload example.

> > + * Applies filter pattern 0x42 to transaction events.
> 
> Adding that this pattern matches SIDs 0x42 and 0x43 might be helpful,
> since span filtering is a bit awkward

Ok.

> [...]
> > +#define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _size,
> _shift)    \
> > +	static inline u32 get_##_name(struct perf_event *event)         \
> > +	{                                                               \
> > +		return (event->attr._config >> (_shift)) &              \
> > +			GENMASK_ULL((_size) - 1, 0);                    \
> 
> FIELD_GET could make this slightly nicer

Sure.

> > +	}
> > +
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(event, config, 16, 0);
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(filter_stream_id, config1, 32, 0);
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(filter_span, config1, 1, 32);
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(filter_enable, config1, 1, 34);
> 
> filter_enable is at bit 33, not 34

Thanks.
 
> [...]
> > +static void smmu_pmu_set_period(struct smmu_pmu *smmu_pmu,
> > +				struct hw_perf_event *hwc)
> > +{
> > +	u32 idx = hwc->idx;
> > +	u64 new;
> > +
> > +	/*
> > +	 * We limit the max period to half the max counter value of the
> counter
> > +	 * size, so that even in the case of extreme interrupt latency the
> > +	 * counter will (hopefully) not wrap past its initial value.
> > +	 */
> > +	new = smmu_pmu->counter_mask >> 1;
> 
> Cool trick :)
> 
> > +
> > +	local64_set(&hwc->prev_count, new);
> > +	smmu_pmu_counter_set_value(smmu_pmu, idx, new);
> > +}
> > +
> > +static unsigned int smmu_pmu_get_event_idx(struct smmu_pmu
> *smmu_pmu)
> > +{
> > +	unsigned int idx;
> > +	unsigned int num_ctrs = smmu_pmu->num_counters;
> > +
> > +	idx = find_first_zero_bit(smmu_pmu->used_counters, num_ctrs);
> > +	if (idx == num_ctrs)
> > +		/* The counters are all in use. */
> > +		return -EAGAIN;
> 
> Then this function's return type probably shouldn't be unsigned

Oops!

> > +
> > +	set_bit(idx, smmu_pmu->used_counters);
> > +
> > +	return idx;
> > +}
> > +
> > +/*
> > + * Implementation of abstract pmu functionality required by
> > + * the core perf events code.
> > + */
> > +
> > +static int smmu_pmu_event_init(struct perf_event *event)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct smmu_pmu *smmu_pmu = to_smmu_pmu(event->pmu);
> > +	struct device *dev = smmu_pmu->dev;
> > +	struct perf_event *sibling;
> > +	u32 event_id;
> > +
> > +	if (event->attr.type != event->pmu->type)
> > +		return -ENOENT;
> > +
> > +	if (hwc->sample_period) {
> > +		dev_dbg_ratelimited(dev, "Sampling not supported\n");
> > +		return -EOPNOTSUPP;
> > +	}
> > +
> > +	if (event->cpu < 0) {
> > +		dev_dbg_ratelimited(dev, "Per-task mode not supported\n");
> > +		return -EOPNOTSUPP;
> > +	}
> > +
> > +	/* We cannot filter accurately so we just don't allow it. */
> > +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > +	    event->attr.exclude_hv || event->attr.exclude_idle) {
> > +		dev_dbg_ratelimited(dev, "Can't exclude execution levels\n");
> > +		return -EOPNOTSUPP;
> > +	}
> > +
> > +	/* Verify specified event is supported on this PMU */
> > +	event_id = get_event(event);
> > +	if (((event_id < SMMU_ARCH_MAX_EVENT_ID) &&
> > +	    (!test_bit(event_id, smmu_pmu->supported_events))) ||
> > +	    (event_id > SMMU_IMPDEF_MAX_EVENT_ID)) {
> 
> >= ?

I was slightly confused by the spec here as it says,

Performance events are indicated by a numeric ID, in the following ranges:
• 0x0000 to 0x007F: Architected events
• 0x0080 to 0xFFFF: IMPLEMENTATION DEFINED events

It looks to me the ids are valid including those limits.

> > +		dev_dbg_ratelimited(dev, "Invalid event %d for this PMU\n",
> > +				    event_id);
> > +		return -EINVAL;
> > +	}
> 
> [...]
> > +static struct attribute *smmu_pmu_events[] = {
> > +	SMMU_EVENT_ATTR(cycles, 0),
> > +	SMMU_EVENT_ATTR(transaction, 1),
> > +	SMMU_EVENT_ATTR(tlb_miss, 2),
> > +	SMMU_EVENT_ATTR(config_cache_miss, 3),
> > +	SMMU_EVENT_ATTR(trans_table_walk, 4),
> 
> This name is a bit misleading - as far as I understand the event doesn't
> count table walks, but memory accesses performed during a walk.

Ok. I will take a look at this.

> > +	SMMU_EVENT_ATTR(config_struct_access, 5),
> > +	SMMU_EVENT_ATTR(pcie_ats_trans_rq, 6),
> > +	SMMU_EVENT_ATTR(pcie_ats_trans_passed, 7),
> > +	NULL
> > +};
> 
> [...]
> > +static int smmu_pmu_setup_irq(struct smmu_pmu *pmu)
> > +{
> > +	unsigned long flags = IRQF_NOBALANCING | IRQF_SHARED |
> IRQF_NO_THREAD;
> 
> Why do you need SHARED?

Though I am not aware of any implementation that is sharing this
interrupt, I think it is good to keep it that way as we are anyway
checking for the OVSSET0 status register in the interrupt handler.

> > +	int irq, ret = -ENXIO;
> > +
> > +	irq = pmu->irq;
> > +	if (irq)
> > +		ret = devm_request_irq(pmu->dev, irq,
> smmu_pmu_handle_irq,
> > +				       flags, "smmuv3-pmu", pmu);
> > +	return ret;
> > +}
> > +
> > +static void smmu_pmu_reset(struct smmu_pmu *smmu_pmu)
> > +{
> > +	smmu_pmu_disable(&smmu_pmu->pmu);
> > +
> > +	/* Disable counter and interrupt */
> > +	writeq_relaxed(smmu_pmu->counter_present_mask,
> > +		       smmu_pmu->reg_base + SMMU_PMCG_CNTENCLR0);
> > +	writeq_relaxed(smmu_pmu->counter_present_mask,
> > +		       smmu_pmu->reg_base + SMMU_PMCG_INTENCLR0);
> > +	writeq_relaxed(smmu_pmu->counter_present_mask,
> > +		       smmu_pmu->reloc_base + SMMU_PMCG_OVSCLR0);
> > +
> > +	writeq_relaxed(0, smmu_pmu->reg_base + SMMU_PMCG_IRQ_CFG0);
> 
> smmu_pmu_setup_msi clears CFG0 a second time, so this line can be
> deleted in patch 3, or moved to smmu_pmu_setup_msi right away.

Ok. I will move this to smmu_pmu_setup_msi.

> > +}
> > +
> > +static int smmu_pmu_probe(struct platform_device *pdev)
> > +{
> > +	struct smmu_pmu *smmu_pmu;
> > +	struct resource *res_0, *res_1;
> > +	u32 cfgr, reg_size;
> > +	u64 ceid_64[2];
> > +	int irq, err;
> > +	char *name;
> > +	struct device *dev = &pdev->dev;
> > +
> > +	smmu_pmu = devm_kzalloc(dev, sizeof(*smmu_pmu), GFP_KERNEL);
> > +	if (!smmu_pmu)
> > +		return -ENOMEM;
> > +
> > +	smmu_pmu->dev = dev;
> > +
> > +	platform_set_drvdata(pdev, smmu_pmu);
> > +	smmu_pmu->pmu = (struct pmu) {
> > +		.task_ctx_nr    = perf_invalid_context,
> > +		.pmu_enable	= smmu_pmu_enable,
> > +		.pmu_disable	= smmu_pmu_disable,
> > +		.event_init	= smmu_pmu_event_init,
> > +		.add		= smmu_pmu_event_add,
> > +		.del		= smmu_pmu_event_del,
> > +		.start		= smmu_pmu_event_start,
> > +		.stop		= smmu_pmu_event_stop,
> > +		.read		= smmu_pmu_event_read,
> > +		.attr_groups	= smmu_pmu_attr_grps,
> > +	};
> > +
> > +	res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +	smmu_pmu->reg_base = devm_ioremap_resource(dev, res_0);
> > +	if (IS_ERR(smmu_pmu->reg_base))
> > +		return PTR_ERR(smmu_pmu->reg_base);
> > +
> > +	cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
> > +
> > +	/* Determine if page 1 is present */
> > +	if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
> > +		res_1 = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> > +		smmu_pmu->reloc_base = devm_ioremap_resource(dev,
> res_1);
> > +		if (IS_ERR(smmu_pmu->reloc_base))
> > +			return PTR_ERR(smmu_pmu->reloc_base);
> > +	} else {
> > +		smmu_pmu->reloc_base = smmu_pmu->reg_base;
> > +	}
> > +
> > +	irq = platform_get_irq(pdev, 0);
> > +	if (irq > 0)
> > +		smmu_pmu->irq = irq;
> > +
> > +	ceid_64[0] = readq_relaxed(smmu_pmu->reg_base +
> SMMU_PMCG_CEID0);
> > +	ceid_64[1] = readq_relaxed(smmu_pmu->reg_base +
> SMMU_PMCG_CEID1);
> > +	bitmap_from_arr32(smmu_pmu->supported_events, (u32 *)ceid_64,
> > +			  SMMU_ARCH_MAX_EVENT_ID);
> > +
> > +	smmu_pmu->num_counters =
> FIELD_GET(SMMU_PMCG_CFGR_NCTR_MASK, cfgr) + 1;
> > +	smmu_pmu->counter_present_mask = GENMASK(smmu_pmu-
> >num_counters - 1, 0);
> > +
> > +	reg_size = FIELD_GET(SMMU_PMCG_CFGR_SIZE_MASK, cfgr);
> > +	smmu_pmu->counter_mask = GENMASK_ULL(reg_size, 0);
> > +
> > +	smmu_pmu_reset(smmu_pmu);
> > +
> > +	err = smmu_pmu_setup_irq(smmu_pmu);
> > +	if (err) {
> > +		dev_err(dev, "Setup irq failed, PMU @%pa\n", &res_0->start);
> 
> You can probably remove "PMU @%pa" from error and info messages, since
> the device name already uniquely identifies it:
> "[    6.168200] arm-smmu-v3-pmu 2b442000.smmu-pmcg: Registered SMMU
> PMU
> @ 0x000000002b442000 using 4 counters"

Interesting. What I have is,

[   25.669636] arm-smmu-v3-pmu arm-smmu-v3-pmu.6.auto: Registered SMMU
PMU @ 0x0000000148001000 using 8 counters

Are you using the same patches and is booting using ACPI? IIRC, in the iort
code  it uses the name "arm-smmu-v3-pmu" and AUTO id to register/add the platform
dev. So not sure, how it is printing the address in your case. 

Please check and let me know.

Thanks,
Shameer
 
> Thanks,
> Jean




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux