Re: [PATCH v9 20/26] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

"Moger, Babu" <babu.moger@xxxxxxx> · Wed, 4 Dec 2024 11:14:41 -0600

Hi Fenghua,

Thanks for the review.

On 12/3/24 22:16, Fenghua Yu wrote:
> Hi, Babu,
> 
> On 10/29/24 16:21, Babu Moger wrote:
>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>> are required per group, one for MBM total event and one for MBM local
>> event.
>>
>> There are a limited number of counters available for assignment. If these
>> counters are exhausted, the kernel will display the error message: "Out of
>> MBM assignable counters". However, it is not necessary to fail the
>> creation of a group due to assignment failures. Users have the flexibility
>> to modify the assignments at a later time.
>>
>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
>> ---
>> v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to
>> return void.
>>      Updated couple of rdtgroup_unassign_cntrs() calls properly.
>>      Updated function comments.
>>
>> v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
>>      Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
>>      Fixed the problem with unassigning the child MON groups of CTRL_MON
>> group.
>>
>> v7: Reworded the commit message.
>>      Removed the reference of ABMC with mbm_cntr_assign.
>>      Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.
>>
>> v6: Removed the redundant comments on all the calls of
>>      rdtgroup_assign_cntrs. Updated the commit message.
>>      Dropped printing error message on every call of rdtgroup_assign_cntrs.
>>
>> v5: Removed the code to enable/disable ABMC during the mount.
>>      That will be another patch.
>>      Added arch callers to get the arch specific data.
>>      Renamed fuctions to match the other abmc function.
>>      Added code comments for assignment failures.
>>
>> v4: Few name changes based on the upstream discussion.
>>      Commit message update.
>>
>> v3: This is a new patch. Patch addresses the upstream comment to enable
>>      ABMC feature by default if the feature is available.
>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 61 +++++++++++++++++++++++++-
>>   1 file changed, 60 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index b0cce3dfd062..a8d21b0b2054 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2932,6 +2932,46 @@ static void schemata_list_destroy(void)
>>       }
>>   }
>>   +/*
>> + * Called when a new group is created. If "mbm_cntr_assign" mode is
>> enabled,
>> + * counters are automatically assigned. Each group can accommodate two
>> counters:
>> + * one for the total event and one for the local event. Assignments may
>> fail
>> + * due to the limited number of counters. However, it is not necessary
>> to fail
>> + * the group creation and thus no failure is returned. Users have the
>> option
>> + * to modify the counter assignments after the group has been created.
>> + */
>> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> +    if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +        return;
>> +
>> +    if (is_mbm_total_enabled())
>> +        rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
>> QOS_L3_MBM_TOTAL_EVENT_ID);
> 
> In this code path,
> resctrl_mkdir()->resctrl_mkdir_ctrl_mon()->rdtgroup_assign_cntrs()->rdtgroup_assign_cntr_event()
> 
> CPUs are not protected by read lock while rdtgroup_assign_cntr_event()
> walks r->mon_domains and run assing counters code on CPUs in the domains.
> Without CPU protection, r->mon_domains may race with CPU hotplug.
> 
> In another patch (i.e. rdt_get_tree()), rdtgroup_assign_cntrs() is
> protected by cpus_read_lock()/unlock().
> 
> So maybe define two helpers:
> 
> // Called when caller takes cpus_read_lock()
> rdtgroup_assign_cntrs_locked()
> {
>     lockdep_assert_cpus_held();
> 
>     then the current rdtgroup_assign_cntrs() code
> }
> 
> // Called when caller doesn't take cpus_read_lock()
> rdtgroup_assign_cntrs()
> {
>     cpus_read_lock();
>     rdtgroup_assign_cntrs_locked();
>     cpus_read_unlock();
> }
> 

Good observation. Agree. There is a problem.
Some of this code will change with earlier comments.

We know couple of paths are affected here. Why not just add the lock
before calling in affected paths instead of adding new helpers?

/*
 * Walking r->domains in rdtgroup_assign_cntrs, ensure it can't race
 * with cpuhp
 */
cpus_read_lock();
rdtgroup_assign_cntrs()
cpus_read_unlock();

>> +
>> +    if (is_mbm_local_enabled())
>> +        rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
>> QOS_L3_MBM_LOCAL_EVENT_ID);
>> +}
>> +
>> +/*
>> + * Called when a group is deleted. Counters are unassigned if it was in
>> + * assigned state.
>> + */
>> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> +    if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +        return;
>> +
>> +    if (is_mbm_total_enabled())
>> +        rdtgroup_unassign_cntr_event(r, rdtgrp, NULL,
>> QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +    if (is_mbm_local_enabled())
>> +        rdtgroup_unassign_cntr_event(r, rdtgrp, NULL,
>> QOS_L3_MBM_LOCAL_EVENT_ID);
>> +}
>> +
> 
> Seems rdtgroup_unassign_cntrs() is always protected by
> cpus_read_lock()/unlock(). So it's good.

ok

> 
>>   static int rdt_get_tree(struct fs_context *fc)
>>   {
>>       struct rdt_fs_context *ctx = rdt_fc2context(fc);
>> @@ -2991,6 +3031,8 @@ static int rdt_get_tree(struct fs_context *fc)
>>           if (ret < 0)
>>               goto out_mongrp;
>>           rdtgroup_default.mon.mon_data_kn = kn_mondata;
>> +
>> +        rdtgroup_assign_cntrs(&rdtgroup_default);
> 
> In this case, cpus_read_lock() was called earlier. Change to
> rdtgroup_assign_cntrs_locked().
> 
>>       }
>>         ret = rdt_pseudo_lock_init();
>> @@ -3021,8 +3063,10 @@ static int rdt_get_tree(struct fs_context *fc)
>>   out_psl:
>>       rdt_pseudo_lock_release();
>>   out_mondata:
>> -    if (resctrl_arch_mon_capable())
>> +    if (resctrl_arch_mon_capable()) {
>> +        rdtgroup_unassign_cntrs(&rdtgroup_default);
>>           kernfs_remove(kn_mondata);
>> +    }
>>   out_mongrp:
>>       if (resctrl_arch_mon_capable())
>>           kernfs_remove(kn_mongrp);
>> @@ -3201,6 +3245,7 @@ static void free_all_child_rdtgrp(struct rdtgroup
>> *rdtgrp)
>>         head = &rdtgrp->mon.crdtgrp_list;
>>       list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
>> +        rdtgroup_unassign_cntrs(sentry);
>>           free_rmid(sentry->closid, sentry->mon.rmid);
>>           list_del(&sentry->mon.crdtgrp_list);
>>   @@ -3241,6 +3286,8 @@ static void rmdir_all_sub(void)
>>           cpumask_or(&rdtgroup_default.cpu_mask,
>>                  &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>>   +        rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>           free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>             kernfs_remove(rdtgrp->kn);
>> @@ -3272,6 +3319,7 @@ static void rdt_kill_sb(struct super_block *sb)
>>       for_each_alloc_capable_rdt_resource(r)
>>           reset_all_ctrls(r);
>>       rmdir_all_sub();
>> +    rdtgroup_unassign_cntrs(&rdtgroup_default);
>>       rdt_pseudo_lock_release();
>>       rdtgroup_default.mode = RDT_MODE_SHAREABLE;
>>       schemata_list_destroy();
>> @@ -3280,6 +3328,7 @@ static void rdt_kill_sb(struct super_block *sb)
>>           resctrl_arch_disable_alloc();
>>       if (resctrl_arch_mon_capable())
>>           resctrl_arch_disable_mon();
>> +
> 
> Unnecessary change.

ok.

> 
>>       resctrl_mounted = false;
>>       kernfs_kill_sb(sb);
>>       mutex_unlock(&rdtgroup_mutex);
>> @@ -3871,6 +3920,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node
>> *parent_kn,
>>           goto out_unlock;
>>       }
>>   +    rdtgroup_assign_cntrs(rdtgrp);
>> + >       kernfs_activate(rdtgrp->kn);
>>         /*
>> @@ -3915,6 +3966,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct
>> kernfs_node *parent_kn,
>>       if (ret)
>>           goto out_closid_free;
>>   +    rdtgroup_assign_cntrs(rdtgrp);
>> +
>>       kernfs_activate(rdtgrp->kn);
>>         ret = rdtgroup_init_alloc(rdtgrp);
>> @@ -3940,6 +3993,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct
>> kernfs_node *parent_kn,
>>   out_del_list:
>>       list_del(&rdtgrp->rdtgroup_list);
>>   out_rmid_free:
>> +    rdtgroup_unassign_cntrs(rdtgrp);
>>       mkdir_rdt_prepare_rmid_free(rdtgrp);
>>   out_closid_free:
>>       closid_free(closid);
>> @@ -4010,6 +4064,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup
>> *rdtgrp, cpumask_var_t tmpmask)
>>       update_closid_rmid(tmpmask, NULL);
>>         rdtgrp->flags = RDT_DELETED;
>> +
>> +    rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>       free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>         /*
>> @@ -4056,6 +4113,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup
>> *rdtgrp, cpumask_var_t tmpmask)
>>       cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>>       update_closid_rmid(tmpmask, NULL);
>>   +    rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>       free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>       closid_free(rdtgrp->closid);
>>   
> 
> Thanks.
> 
> -Fenghua
> 

-- 
Thanks
Babu Moger