Hi Fenghua, On 12/4/24 11:14, Moger, Babu wrote: > Hi Fenghua, > > Thanks for the review. > > On 12/3/24 22:16, Fenghua Yu wrote: >> Hi, Babu, >> >> On 10/29/24 16:21, Babu Moger wrote: >>> Assign/unassign counters on resctrl group creation/deletion. Two counters >>> are required per group, one for MBM total event and one for MBM local >>> event. >>> >>> There are a limited number of counters available for assignment. If these >>> counters are exhausted, the kernel will display the error message: "Out of >>> MBM assignable counters". However, it is not necessary to fail the >>> creation of a group due to assignment failures. Users have the flexibility >>> to modify the assignments at a later time. >>> >>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx> >>> --- >>> v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to >>> return void. >>> Updated couple of rdtgroup_unassign_cntrs() calls properly. >>> Updated function comments. >>> >>> v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs. >>> Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs. >>> Fixed the problem with unassigning the child MON groups of CTRL_MON >>> group. >>> >>> v7: Reworded the commit message. >>> Removed the reference of ABMC with mbm_cntr_assign. >>> Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp. >>> >>> v6: Removed the redundant comments on all the calls of >>> rdtgroup_assign_cntrs. Updated the commit message. >>> Dropped printing error message on every call of rdtgroup_assign_cntrs. >>> >>> v5: Removed the code to enable/disable ABMC during the mount. >>> That will be another patch. >>> Added arch callers to get the arch specific data. >>> Renamed fuctions to match the other abmc function. >>> Added code comments for assignment failures. >>> >>> v4: Few name changes based on the upstream discussion. >>> Commit message update. >>> >>> v3: This is a new patch. Patch addresses the upstream comment to enable >>> ABMC feature by default if the feature is available. >>> --- >>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 61 +++++++++++++++++++++++++- >>> 1 file changed, 60 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> index b0cce3dfd062..a8d21b0b2054 100644 >>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> @@ -2932,6 +2932,46 @@ static void schemata_list_destroy(void) >>> } >>> } >>> +/* >>> + * Called when a new group is created. If "mbm_cntr_assign" mode is >>> enabled, >>> + * counters are automatically assigned. Each group can accommodate two >>> counters: >>> + * one for the total event and one for the local event. Assignments may >>> fail >>> + * due to the limited number of counters. However, it is not necessary >>> to fail >>> + * the group creation and thus no failure is returned. Users have the >>> option >>> + * to modify the counter assignments after the group has been created. >>> + */ >>> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp) >>> +{ >>> + struct rdt_resource *r = >>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; >>> + >>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) >>> + return; >>> + >>> + if (is_mbm_total_enabled()) >>> + rdtgroup_assign_cntr_event(r, rdtgrp, NULL, >>> QOS_L3_MBM_TOTAL_EVENT_ID); >> >> In this code path, >> resctrl_mkdir()->resctrl_mkdir_ctrl_mon()->rdtgroup_assign_cntrs()->rdtgroup_assign_cntr_event() >> >> CPUs are not protected by read lock while rdtgroup_assign_cntr_event() >> walks r->mon_domains and run assing counters code on CPUs in the domains. >> Without CPU protection, r->mon_domains may race with CPU hotplug. >> >> In another patch (i.e. rdt_get_tree()), rdtgroup_assign_cntrs() is >> protected by cpus_read_lock()/unlock(). >> >> So maybe define two helpers: >> >> // Called when caller takes cpus_read_lock() >> rdtgroup_assign_cntrs_locked() >> { >> lockdep_assert_cpus_held(); >> >> then the current rdtgroup_assign_cntrs() code >> } >> >> // Called when caller doesn't take cpus_read_lock() >> rdtgroup_assign_cntrs() >> { >> cpus_read_lock(); >> rdtgroup_assign_cntrs_locked(); >> cpus_read_unlock(); >> } >> > > Good observation. Agree. There is a problem. > Some of this code will change with earlier comments. > > We know couple of paths are affected here. Why not just add the lock > before calling in affected paths instead of adding new helpers? > > /* > * Walking r->domains in rdtgroup_assign_cntrs, ensure it can't race > * with cpuhp > */ > cpus_read_lock(); > rdtgroup_assign_cntrs() > cpus_read_unlock(); > Oh no. Looks like we are good here. Looks at Reinette's response. Thanks Reinette. https://lore.kernel.org/lkml/4032a5a5-dd0a-49ae-94b6-dc4fac4c190d@xxxxxxxxx/ > > >>> + >>> + if (is_mbm_local_enabled()) >>> + rdtgroup_assign_cntr_event(r, rdtgrp, NULL, >>> QOS_L3_MBM_LOCAL_EVENT_ID); >>> +} >>> + >>> +/* >>> + * Called when a group is deleted. Counters are unassigned if it was in >>> + * assigned state. >>> + */ >>> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp) >>> +{ >>> + struct rdt_resource *r = >>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; >>> + >>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) >>> + return; >>> + >>> + if (is_mbm_total_enabled()) >>> + rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, >>> QOS_L3_MBM_TOTAL_EVENT_ID); >>> + >>> + if (is_mbm_local_enabled()) >>> + rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, >>> QOS_L3_MBM_LOCAL_EVENT_ID); >>> +} >>> + >> >> Seems rdtgroup_unassign_cntrs() is always protected by >> cpus_read_lock()/unlock(). So it's good. > > ok > >> >>> static int rdt_get_tree(struct fs_context *fc) >>> { >>> struct rdt_fs_context *ctx = rdt_fc2context(fc); >>> @@ -2991,6 +3031,8 @@ static int rdt_get_tree(struct fs_context *fc) >>> if (ret < 0) >>> goto out_mongrp; >>> rdtgroup_default.mon.mon_data_kn = kn_mondata; >>> + >>> + rdtgroup_assign_cntrs(&rdtgroup_default); >> >> In this case, cpus_read_lock() was called earlier. Change to >> rdtgroup_assign_cntrs_locked(). >> >>> } >>> ret = rdt_pseudo_lock_init(); >>> @@ -3021,8 +3063,10 @@ static int rdt_get_tree(struct fs_context *fc) >>> out_psl: >>> rdt_pseudo_lock_release(); >>> out_mondata: >>> - if (resctrl_arch_mon_capable()) >>> + if (resctrl_arch_mon_capable()) { >>> + rdtgroup_unassign_cntrs(&rdtgroup_default); >>> kernfs_remove(kn_mondata); >>> + } >>> out_mongrp: >>> if (resctrl_arch_mon_capable()) >>> kernfs_remove(kn_mongrp); >>> @@ -3201,6 +3245,7 @@ static void free_all_child_rdtgrp(struct rdtgroup >>> *rdtgrp) >>> head = &rdtgrp->mon.crdtgrp_list; >>> list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) { >>> + rdtgroup_unassign_cntrs(sentry); >>> free_rmid(sentry->closid, sentry->mon.rmid); >>> list_del(&sentry->mon.crdtgrp_list); >>> @@ -3241,6 +3286,8 @@ static void rmdir_all_sub(void) >>> cpumask_or(&rdtgroup_default.cpu_mask, >>> &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); >>> + rdtgroup_unassign_cntrs(rdtgrp); >>> + >>> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); >>> kernfs_remove(rdtgrp->kn); >>> @@ -3272,6 +3319,7 @@ static void rdt_kill_sb(struct super_block *sb) >>> for_each_alloc_capable_rdt_resource(r) >>> reset_all_ctrls(r); >>> rmdir_all_sub(); >>> + rdtgroup_unassign_cntrs(&rdtgroup_default); >>> rdt_pseudo_lock_release(); >>> rdtgroup_default.mode = RDT_MODE_SHAREABLE; >>> schemata_list_destroy(); >>> @@ -3280,6 +3328,7 @@ static void rdt_kill_sb(struct super_block *sb) >>> resctrl_arch_disable_alloc(); >>> if (resctrl_arch_mon_capable()) >>> resctrl_arch_disable_mon(); >>> + >> >> Unnecessary change. > > ok. > >> >>> resctrl_mounted = false; >>> kernfs_kill_sb(sb); >>> mutex_unlock(&rdtgroup_mutex); >>> @@ -3871,6 +3920,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node >>> *parent_kn, >>> goto out_unlock; >>> } >>> + rdtgroup_assign_cntrs(rdtgrp); >>> + > kernfs_activate(rdtgrp->kn); >>> /* >>> @@ -3915,6 +3966,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct >>> kernfs_node *parent_kn, >>> if (ret) >>> goto out_closid_free; >>> + rdtgroup_assign_cntrs(rdtgrp); >>> + >>> kernfs_activate(rdtgrp->kn); >>> ret = rdtgroup_init_alloc(rdtgrp); >>> @@ -3940,6 +3993,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct >>> kernfs_node *parent_kn, >>> out_del_list: >>> list_del(&rdtgrp->rdtgroup_list); >>> out_rmid_free: >>> + rdtgroup_unassign_cntrs(rdtgrp); >>> mkdir_rdt_prepare_rmid_free(rdtgrp); >>> out_closid_free: >>> closid_free(closid); >>> @@ -4010,6 +4064,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup >>> *rdtgrp, cpumask_var_t tmpmask) >>> update_closid_rmid(tmpmask, NULL); >>> rdtgrp->flags = RDT_DELETED; >>> + >>> + rdtgroup_unassign_cntrs(rdtgrp); >>> + >>> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); >>> /* >>> @@ -4056,6 +4113,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup >>> *rdtgrp, cpumask_var_t tmpmask) >>> cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask); >>> update_closid_rmid(tmpmask, NULL); >>> + rdtgroup_unassign_cntrs(rdtgrp); >>> + >>> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); >>> closid_free(rdtgrp->closid); >>> >> >> Thanks. >> >> -Fenghua >> > -- Thanks Babu Moger