On Thu, Aug 17, 2017 at 03:52:24PM +0100, Suzuki K Poulose wrote: > On 16/08/17 15:10, Mark Rutland wrote: > >On Tue, Aug 08, 2017 at 12:37:26PM +0100, Suzuki K Poulose wrote: > >>+static struct attribute *dsu_pmu_event_attrs[] = { > >>+ DSU_EVENT_ATTR(cycles, 0x11), > >>+ DSU_EVENT_ATTR(bus_acecss, 0x19), > >>+ DSU_EVENT_ATTR(memory_error, 0x1a), > >>+ DSU_EVENT_ATTR(bus_cycles, 0x1d), > >>+ DSU_EVENT_ATTR(l3d_cache_allocate, 0x29), > >>+ DSU_EVENT_ATTR(l3d_cache_refill, 0x2a), > >>+ DSU_EVENT_ATTR(l3d_cache, 0x2b), > >>+ DSU_EVENT_ATTR(l3d_cache_wb, 0x2c), > > > >MAX_COMMON_EVENTS seems to be 0x40, so are we just assuming the below > >are implemented? > > > >If so, why bother exposing them at all? We can't know that they're going > >to work... > > Thats right. The only defending argument is that the event codes are specific > to the DynamIQ, unlike the COMMON_EVENTS which matches the ARMv8 PMU event codes. > So, someone would need to carefully find the event code for a particular event. > Having these entries would make it easier for the user to do the profiling. > > Also, future revisions of the DSU could potentially expose more events. So there > wouldn't be any way to tell the user (provided there aren't any changes to the > programming model and we decide to reuse the same compatible string) what we *could* > potentially support. In short, this is not a problem at the moment and we could > do something about it as and when required. I'd rather that we only describes those that we can probe from the PMCEID* registers, and for the rest, left the user to consult a manual. I can well imagine future variants of this IP supporing different events, and I'd prefer to avoid poriflerating tables for those. [...] > >>+static struct attribute *dsu_pmu_cpumask_attrs[] = { > >>+ DSU_CPUMASK_ATTR(cpumask, DSU_ACTIVE_CPU_MASK), > >>+ DSU_CPUMASK_ATTR(supported_cpus, DSU_SUPPORTED_CPU_MASK), > >>+ NULL, > >>+}; > > > >Normally we only expose one mask. > > > >Why do we need the supported cpu mask? What is the intended use-case? > > Thats just to let the user know the CPUs bound to this PMU instance. I guess that can be useful, though the cpumasks we expose today are confusing as-is, and this is another point of confusion. We could drop this for now, and add it when requested, or we should try to make the naming clearer somehow -- "supported" can be read in a number of ways. Further, it would be worth documenting this PMU under Documentation/perf/. [...] > >>+static int dsu_pmu_add(struct perf_event *event, int flags) > >>+{ > >>+ struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu); > >>+ struct dsu_hw_events *hw_events = &dsu_pmu->hw_events; > >>+ struct hw_perf_event *hwc = &event->hw; > >>+ int idx; > >>+ > >>+ if (!cpumask_test_cpu(smp_processor_id(), &dsu_pmu->supported_cpus)) > >>+ return -ENOENT; > > > >This shouldn't ever happen, and we can check against the active cpumask, > >with a WARN_ON_ONCE(). We have to do this for CPU PMUs since they > >support events which can migrate with tasks, but that's not the case > >here. > > > >[...] > > But, we have to make sure we are reading the events from a CPU within this > DSU in case we have multiple DSU units. Regardless of how many instances there are, the core should *never* add() a CPU-bound event (which DSU events are) on another CPU. To do so would be a major bug. So if this is just a paranoid check, we should WARN_ON_ONCE(). Otherwise, it's unnecessary. > >>+/** > >>+ * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster. > >>+ */ > >>+static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask) > >>+{ > >>+ int i = 0, n, cpu; > >>+ struct device_node *cpu_node; > >>+ > >>+ n = of_count_phandle_with_args(dev, "cpus", NULL); > >>+ if (n <= 0) > >>+ return -ENODEV; > >>+ for (; i < n; i++) { > >>+ cpu_node = of_parse_phandle(dev, "cpus", i); > >>+ if (!cpu_node) > >>+ break; > >>+ cpu = of_cpu_node_to_id(cpu_node); > >>+ of_node_put(cpu_node); > >>+ if (cpu < 0) > >>+ break; > > > >I believe this can happen if the kernel's nr_cpus were capped below the > >number of CPUs in the system. So we need to iterate all entries of the > >cpus list regardless. > > > > Good point. Initial version of the driver used to ignore the failures, but > was changed later. I will roll it back. Thanks. If you could add a comment as to why, that'll hopefully avoid anyone trying to "fix" the logic later. [...] > >>+ cpmcr = __dsu_pmu_read_pmcr(); > >>+ dsu_pmu->num_counters = ((cpmcr >> CLUSTERPMCR_N_SHIFT) & > >>+ CLUSTERPMCR_N_MASK); > >>+ if (!dsu_pmu->num_counters) > >>+ return; > > > >Is that valid? What range of values makes sense? > > > >[...] > > > > We should at least have one counter implemented (excluding the cycle counter). > And yes, we should check if the num_counters <= 31. Ok. > >>+ /* > >>+ * Find one CPU in the DSU to handle the IRQs. > >>+ * It is highly unlikely that we would fail > >>+ * to find one, given the probing has succeeded. > >>+ */ > >>+ cpu = dsu_pmu_get_online_cpu(dsu_pmu); > >>+ if (cpu >= nr_cpu_ids) > >>+ return -ENODEV; > >>+ cpumask_set_cpu(cpu, &dsu_pmu->active_cpu); > >>+ rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu); > > > >Is setting the affinity hint strong enough? > > > >Can the affinity be changed behind the back of this driver? > > Did you mean to use "force"d affinity settings ? If so, modules > don't have the luxury of doing that. Perhaps. We absolutely need to ensure that the driver makes the IRQ affine to the active CPU, and no other SW will change the affinitiy of the IRQ. Otherwise, the IRQ handler is dangerous, violating locking requirements, potentially corrupting memory, etc. > Hence this one. I think that also brings up the problem where we could > be reading the counters from a different CPU than we requested. So, I > think it would be good to keep the CPU check, wherever we could access > the PMU. While I'm happy to have that as a paranoid sanity check, we cannot rely upon that for correctness. We must ensure that we amange the interupt affinity correctly. If that means we need the forced affinity helpers, we must ensure that we have access to those. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html