Re: [PATCH v9 18/26] x86/resctrl: Add the interface to assign/update counter assignment

"Moger, Babu" <babu.moger@xxxxxxx> · Wed, 20 Nov 2024 12:05:13 -0600

Hi Reinette,

On 11/15/24 18:57, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/29/24 4:21 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode offers several hardware counters that can be
>> assigned to an RMID, event pair and monitor the bandwidth as long as it
>> is assigned.
>>
>> Counters are managed at two levels. The global assignment is tracked
>> using the mbm_cntr_free_map field in the struct resctrl_mon, while
>> domain-specific assignments are tracked using the mbm_cntr_map field
>> in the struct rdt_mon_domain. Allocation begins at the global level
>> and is then applied individually to each domain.
>>
>> Introduce an interface to allocate these counters and update the
>> corresponding domains accordingly.
>>
>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
>> ---
> 
> ...
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 00f7bf60e16a..cb496bd97007 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -717,6 +717,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
>>  int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>  			     u32 cntr_id, bool assign);
>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>>  void rdt_staged_configs_clear(void);
>>  bool closid_allocated(unsigned int closid);
>>  int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1b5529c212f5..bc3752967c44 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1924,6 +1924,93 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Configure the counter for the event, RMID pair for the domain.
>> + * Update the bitmap and reset the architectural state.
>> + */
>> +static int resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			       enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> +			       u32 cntr_id, bool assign)
>> +{
>> +	int ret;
>> +
>> +	ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, assign);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (assign)
>> +		__set_bit(cntr_id, d->mbm_cntr_map);
>> +	else
>> +		__clear_bit(cntr_id, d->mbm_cntr_map);
>> +
>> +	/*
>> +	 * Reset the architectural state so that reading of hardware
>> +	 * counter is not considered as an overflow in next update.
>> +	 */
>> +	resctrl_arch_reset_rmid(r, d, closid, rmid, evtid);
> 
> resctrl_arch_reset_rmid() expects to be run on a CPU that is in the domain
> @d ... note that after the architectural state is reset it initializes the
> state by reading the event on the current CPU. By running it here it is
> run on a random CPU that may not be in the right domain.

Yes. That is correct.  We can move this part to our earlier
implementation. We dont need to read the RMID.  We just have to reset the
counter.

https://lore.kernel.org/lkml/16d88cc4091cef1999b7ec329364e12dd0dc748d.1728495588.git.babu.moger@xxxxxxx/

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 9fe419d0c536..bc3654ec3a08 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2371,6 +2371,13 @@ int resctrl_arch_config_cntr(struct rdt_resource
*r, struct rdt_mon_domain *d,
        smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd,
                              &abmc_cfg, 1);

+       /*
+        * Reset the architectural state so that reading of hardware
+        * counter is not considered as an overflow in next update.
+        */
+       if (arch_mbm)
+               memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
+
        return 0;
 }


> 
>> +
>> +	return ret;
>> +}
>> +
>> +static bool mbm_cntr_assigned_to_domain(struct rdt_resource *r, u32 cntr_id)
>> +{
>> +	struct rdt_mon_domain *d;
>> +
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list)
>> +		if (test_bit(cntr_id, d->mbm_cntr_map))
>> +			return 1;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Assign a hardware counter to event @evtid of group @rdtgrp.
>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>> + * else the counter will be assigned to specific domain.
>> + */
>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +			       struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>> +{
>> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +	int cntr_id = rdtgrp->mon.cntr_id[index];
>> +	int ret;
>> +
>> +	/*
>> +	 * Allocate a new counter id to the event if the counter is not
>> +	 * assigned already.
>> +	 */
>> +	if (cntr_id == MON_CNTR_UNSET) {
>> +		cntr_id = mbm_cntr_alloc(r);
>> +		if (cntr_id < 0) {
>> +			rdt_last_cmd_puts("Out of MBM assignable counters\n");
>> +			return -ENOSPC;
>> +		}
>> +		rdtgrp->mon.cntr_id[index] = cntr_id;
>> +	}
>> +
>> +	if (!d) {
>> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +			ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> +						  rdtgrp->closid, cntr_id, true);
>> +			if (ret)
>> +				goto out_done_assign;
> 
> This may not be what users expect. What if, for example, domain #1 has a counter
> assigned to "total" event and then user wants to change that to 
> assign a counter to "total" event of all domains. Would this not reconfigure the
> counter associated with domain #1 and unnecessarily reset it? Could this be
> made a bit smarter to only configure a counter on a domain if it is not already
> configured? This could perhaps form part of resctrl_config_cntr() to not scatter
> the duplicate check everywhere. What do you think?

Yes. that is correct. We can add a check in resctrl_config_cntr().

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 9fe419d0c536..bc3654ec3a08 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c

@@ -2384,6 +2391,10 @@ static int resctrl_config_cntr(struct rdt_resource
*r, struct rdt_mon_domain *d,
 {
        int ret;

+       /* Return success if the domain is in expected assign state already */
+       if (assign == test_bit(cntr_id, d->mbm_cntr_map))
+               return 0;
+
        ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id,
assign);
        if (ret)
                return ret;


> 
> Also, looks like this can do partial assignment. For example, if one of the
> domains encounter a failure then domains already configured are not undone. This
> matches other similar flows but is not documented and left to reader to decipher. 

I will add the text in patch 26
(x86/resctrl: Introduce interface to modify assignment states of the groups).

> 
> 
>> +		}
>> +	} else {
>> +		ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> +					  rdtgrp->closid, cntr_id, true);
> 
> Looking at flows calling rdtgroup_assign_cntr_event() I do not see a check
> if counter is already assigned. So, if a user makes a loop of assigning a counter
> to the same event over and over it will result in an IPI every time. This seems
> unnecessary, what do you think?

This will be taken care by the above check in resctrl_config_cntr().

> 
>> +		if (ret)
>> +			goto out_done_assign;
>> +	}
>> +
>> +out_done_assign:
>> +	if (ret && !mbm_cntr_assigned_to_domain(r, cntr_id)) {
>> +		mbm_cntr_free(r, cntr_id);
>> +		rdtgroup_cntr_id_init(rdtgrp, evtid);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger