On Tue, Apr 02, 2024 at 09:40:08PM +0800, Dawei Li wrote: > Hi Mark, > > Thanks for the quick review. > > On Tue, Apr 02, 2024 at 12:12:50PM +0100, Mark Rutland wrote: > > On Tue, Apr 02, 2024 at 06:56:01PM +0800, Dawei Li wrote: > > > Hi, > > > > > > This series try to eliminate direct cpumask var allocation from stack > > > for perf subsystem. > > > > > > Direct/explicit allocation of cpumask on stack could be dangerous since > > > it can lead to stack overflow for systems with big NR_CPUS or > > > CONFIG_CPUMASK_OFFSTACK=y. > > > > > > For arm64, it's more urgent since commit 3fbd56f0e7c1 ("ARM64: Dynamically > > > allocate cpumasks and increase supported CPUs to 512"). > > > > > > It's sort of a pattern that almost every cpumask var in perf subystem > > > occurs in teardown callback of cpuhp. In which case, if dynamic > > > allocation failed(which is unlikely), we choose return 0 rather than > > > -ENOMEM to caller cuz: > > > @teardown is not supposed to fail and if it does, system crashes: > > > > .. but we've left the system in an incorrect state, so that makes no sense. > > > > As I commented on the first patch, NAK to dynamically allocating cpumasks in > > the CPUHP callbacks. Please allocate the necessry cpumasks up-front when we > > probe the PMU. At that time we can handle an allocation failure by cleaning up > > and failing to probe the PMU, and then the CPUHP callbacks don't need to > > allocate memory to offline a CPU... > > Agreed that dynamically allocation in callbacks lead to inconsistency > to system. > > My (original)alternative plan is simple but ugly, just make cpumask var > _static_ and add extra static lock to protect it. > > The only difference between solution above and your proposal is static/ > dynamic alloction. CPUHP's teardown cb is supposed to run in targetted > cpuhp thread for most cases, and it's racy. Even the cpumask var is > wrapped in dynamically allocated struct xxx_pmu, it's still shareable > between different threads/contexts and needs proper protection. I was under the impression that the cpuhp callbacks were *strictly* serialised. If that's not the case, the existing offlining callbacks are horrendously broken. Are you *certain* these can race? Regardless, adding additional locking here is not ok. > Simple as this(_untested_): > > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index 7ef9c7e4836b..fa89c3db4d7d 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -1950,18 +1950,24 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no > struct arm_cmn *cmn; > unsigned int target; > int node; > - cpumask_t mask; > + static cpumask_t mask; > + static DEFINE_SPINLOCK(cpumask_lock); > > cmn = hlist_entry_safe(cpuhp_node, struct arm_cmn, cpuhp_node); > if (cpu != cmn->cpu) > return 0; > > + spin_lock(&cpumask_lock); > + > node = dev_to_node(cmn->dev); > if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) && > cpumask_andnot(&mask, &mask, cpumask_of(cpu))) > target = cpumask_any(&mask); > else > target = cpumask_any_but(cpu_online_mask, cpu); > + > + spin_unlock(&cpumask_lock); > + > if (target < nr_cpu_ids) > arm_cmn_migrate(cmn, target); > return 0; Looking at this case, the only reason we need the mask is because it made the logic a little easier to write. All we really want is to choose some CPU in the intersection of two masks ignoring a specific CPU, and there was no helper function to do that. We can add a new helper to do that for us, which would avoid redundant work to manipulate the entire mask, and it would make the existing code simpler. I had a series a few years back to add cpumask_any_and_but(): https://lore.kernel.org/lkml/1486381132-5610-1-git-send-email-mark.rutland@xxxxxxx/ ... and that's easy to use here, e.g. | diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c | index 7ef9c7e4836b7..c6bbd387ccf8b 100644 | --- a/drivers/perf/arm-cmn.c | +++ b/drivers/perf/arm-cmn.c | @@ -1950,17 +1950,15 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no | struct arm_cmn *cmn; | unsigned int target; | int node; | - cpumask_t mask; | | cmn = hlist_entry_safe(cpuhp_node, struct arm_cmn, cpuhp_node); | if (cpu != cmn->cpu) | return 0; | | node = dev_to_node(cmn->dev); | - if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) && | - cpumask_andnot(&mask, &mask, cpumask_of(cpu))) | - target = cpumask_any(&mask); | - else | + target = cpumask_any_and_but(cpu_online_mask, cpumask_of_node(node), | + cpu); | + if (target >= nr_cpu_ids) | target = cpumask_any_but(cpu_online_mask, cpu); | if (target < nr_cpu_ids) | arm_cmn_migrate(cmn, target); It doesn't trivially rebase since the cpumask code has changed a fair amount, but I've managed to do that locally, and I can send that out as a seven-years-late v2 if it's useful. >From a quick scan, it looks like that'd handle all cases in this series. Are there any patterns in this series for which that would not be sufficient? Mark. > > And yes, static allocation is evil :) > > > Thanks, > > Dawei > > > > > Also, for the titles it'd be better to say something like "avoid placing > > cpumasks on the stack", because "explicit cpumask var allocation" sounds like > > the use of alloc_cpumask_var(). > > Sound great! I will update it. > > > > > Mark. > > > > > > > > static int cpuhp_issue_call(int cpu, enum cpuhp_state state, bool bringup, > > > struct hlist_node *node) > > > { > > > struct cpuhp_step *sp = cpuhp_get_step(state); > > > int ret; > > > > > > /* > > > * If there's nothing to do, we done. > > > * Relies on the union for multi_instance. > > > */ > > > if (cpuhp_step_empty(bringup, sp)) > > > return 0; > > > /* > > > * The non AP bound callbacks can fail on bringup. On teardown > > > * e.g. module removal we crash for now. > > > */ > > > #ifdef CONFIG_SMP > > > if (cpuhp_is_ap_state(state)) > > > ret = cpuhp_invoke_ap_callback(cpu, state, bringup, node); > > > else > > > ret = cpuhp_invoke_callback(cpu, state, bringup, node, > > > NULL); > > > #else > > > ret = cpuhp_invoke_callback(cpu, state, bringup, node, NULL); > > > #endif > > > BUG_ON(ret && !bringup); > > > return ret; > > > } > > > > > > Dawei Li (9): > > > perf/alibaba_uncore_drw: Avoid explicit cpumask var allocation from > > > stack > > > perf/arm-cmn: Avoid explicit cpumask var allocation from stack > > > perf/arm_cspmu: Avoid explicit cpumask var allocation from stack > > > perf/arm_dsu: Avoid explicit cpumask var allocation from stack > > > perf/dwc_pcie: Avoid explicit cpumask var allocation from stack > > > perf/hisi_pcie: Avoid explicit cpumask var allocation from stack > > > perf/hisi_uncore: Avoid explicit cpumask var allocation from stack > > > perf/qcom_l2: Avoid explicit cpumask var allocation from stack > > > perf/thunder_x2: Avoid explicit cpumask var allocation from stack > > > > > > drivers/perf/alibaba_uncore_drw_pmu.c | 13 +++++++++---- > > > drivers/perf/arm-cmn.c | 13 +++++++++---- > > > drivers/perf/arm_cspmu/arm_cspmu.c | 13 +++++++++---- > > > drivers/perf/arm_dsu_pmu.c | 18 +++++++++++++----- > > > drivers/perf/dwc_pcie_pmu.c | 17 +++++++++++------ > > > drivers/perf/hisilicon/hisi_pcie_pmu.c | 15 ++++++++++----- > > > drivers/perf/hisilicon/hisi_uncore_pmu.c | 13 +++++++++---- > > > drivers/perf/qcom_l2_pmu.c | 15 ++++++++++----- > > > drivers/perf/thunderx2_pmu.c | 20 ++++++++++++-------- > > > 9 files changed, 92 insertions(+), 45 deletions(-) > > > > > > > > > Thanks, > > > > > > Dawei > > > > > > -- > > > 2.27.0 > > > > >