Hi Mark, Thanks for the quick review. On Tue, Apr 02, 2024 at 12:12:50PM +0100, Mark Rutland wrote: > On Tue, Apr 02, 2024 at 06:56:01PM +0800, Dawei Li wrote: > > Hi, > > > > This series try to eliminate direct cpumask var allocation from stack > > for perf subsystem. > > > > Direct/explicit allocation of cpumask on stack could be dangerous since > > it can lead to stack overflow for systems with big NR_CPUS or > > CONFIG_CPUMASK_OFFSTACK=y. > > > > For arm64, it's more urgent since commit 3fbd56f0e7c1 ("ARM64: Dynamically > > allocate cpumasks and increase supported CPUs to 512"). > > > > It's sort of a pattern that almost every cpumask var in perf subystem > > occurs in teardown callback of cpuhp. In which case, if dynamic > > allocation failed(which is unlikely), we choose return 0 rather than > > -ENOMEM to caller cuz: > > @teardown is not supposed to fail and if it does, system crashes: > > .. but we've left the system in an incorrect state, so that makes no sense. > > As I commented on the first patch, NAK to dynamically allocating cpumasks in > the CPUHP callbacks. Please allocate the necessry cpumasks up-front when we > probe the PMU. At that time we can handle an allocation failure by cleaning up > and failing to probe the PMU, and then the CPUHP callbacks don't need to > allocate memory to offline a CPU... Agreed that dynamically allocation in callbacks lead to inconsistency to system. My (original)alternative plan is simple but ugly, just make cpumask var _static_ and add extra static lock to protect it. The only difference between solution above and your proposal is static/ dynamic alloction. CPUHP's teardown cb is supposed to run in targetted cpuhp thread for most cases, and it's racy. Even the cpumask var is wrapped in dynamically allocated struct xxx_pmu, it's still shareable between different threads/contexts and needs proper protection. Simple as this(_untested_): diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index 7ef9c7e4836b..fa89c3db4d7d 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -1950,18 +1950,24 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no struct arm_cmn *cmn; unsigned int target; int node; - cpumask_t mask; + static cpumask_t mask; + static DEFINE_SPINLOCK(cpumask_lock); cmn = hlist_entry_safe(cpuhp_node, struct arm_cmn, cpuhp_node); if (cpu != cmn->cpu) return 0; + spin_lock(&cpumask_lock); + node = dev_to_node(cmn->dev); if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) && cpumask_andnot(&mask, &mask, cpumask_of(cpu))) target = cpumask_any(&mask); else target = cpumask_any_but(cpu_online_mask, cpu); + + spin_unlock(&cpumask_lock); + if (target < nr_cpu_ids) arm_cmn_migrate(cmn, target); return 0; And yes, static allocation is evil :) Thanks, Dawei > > Also, for the titles it'd be better to say something like "avoid placing > cpumasks on the stack", because "explicit cpumask var allocation" sounds like > the use of alloc_cpumask_var(). Sound great! I will update it. > > Mark. > > > > > static int cpuhp_issue_call(int cpu, enum cpuhp_state state, bool bringup, > > struct hlist_node *node) > > { > > struct cpuhp_step *sp = cpuhp_get_step(state); > > int ret; > > > > /* > > * If there's nothing to do, we done. > > * Relies on the union for multi_instance. > > */ > > if (cpuhp_step_empty(bringup, sp)) > > return 0; > > /* > > * The non AP bound callbacks can fail on bringup. On teardown > > * e.g. module removal we crash for now. > > */ > > #ifdef CONFIG_SMP > > if (cpuhp_is_ap_state(state)) > > ret = cpuhp_invoke_ap_callback(cpu, state, bringup, node); > > else > > ret = cpuhp_invoke_callback(cpu, state, bringup, node, > > NULL); > > #else > > ret = cpuhp_invoke_callback(cpu, state, bringup, node, NULL); > > #endif > > BUG_ON(ret && !bringup); > > return ret; > > } > > > > Dawei Li (9): > > perf/alibaba_uncore_drw: Avoid explicit cpumask var allocation from > > stack > > perf/arm-cmn: Avoid explicit cpumask var allocation from stack > > perf/arm_cspmu: Avoid explicit cpumask var allocation from stack > > perf/arm_dsu: Avoid explicit cpumask var allocation from stack > > perf/dwc_pcie: Avoid explicit cpumask var allocation from stack > > perf/hisi_pcie: Avoid explicit cpumask var allocation from stack > > perf/hisi_uncore: Avoid explicit cpumask var allocation from stack > > perf/qcom_l2: Avoid explicit cpumask var allocation from stack > > perf/thunder_x2: Avoid explicit cpumask var allocation from stack > > > > drivers/perf/alibaba_uncore_drw_pmu.c | 13 +++++++++---- > > drivers/perf/arm-cmn.c | 13 +++++++++---- > > drivers/perf/arm_cspmu/arm_cspmu.c | 13 +++++++++---- > > drivers/perf/arm_dsu_pmu.c | 18 +++++++++++++----- > > drivers/perf/dwc_pcie_pmu.c | 17 +++++++++++------ > > drivers/perf/hisilicon/hisi_pcie_pmu.c | 15 ++++++++++----- > > drivers/perf/hisilicon/hisi_uncore_pmu.c | 13 +++++++++---- > > drivers/perf/qcom_l2_pmu.c | 15 ++++++++++----- > > drivers/perf/thunderx2_pmu.c | 20 ++++++++++++-------- > > 9 files changed, 92 insertions(+), 45 deletions(-) > > > > > > Thanks, > > > > Dawei > > > > -- > > 2.27.0 > > >