Re: [PATCH v4 1/7] x86/resctrl: Create separate domains for control and monitoring

Tony Luck <tony.luck@xxxxxxxxx> · Fri, 25 Aug 2023 10:05:03 -0700

On Fri, Aug 11, 2023 at 10:29:25AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 7/22/2023 12:07 PM, Tony Luck wrote:
> > First step towards supporting resource control where the scope of
> > control operations is not the same as monitor operations.
> 
> Each changelog should stand on its own merit. This changelog
> appears to be written as a continuation of the cover letter.
> 
> Please do ensure that each patch first establishes the context
> before it describes the problem and solution. For example,
> as a context this changelog can start by describing what the
> resctrl domains list represents.
> 
> > 
> > Add an extra list in the rdt_resource structure. For this will
> > just duplicate the existing list of domains based on the L3 cache
> > scope.
> 
> The above paragraph does not make this change appealing at all.
> 
> > Refactor the domain_add_cpu() and domain_remove() functions to
> 
> domain_remove() -> domain_remove_cpu()
> 
> > build separate lists for r->alloc_capable and r->mon_capable
> > resources. Note that only the "L3" domain currently supports
> > both types.
> 
> "L3" domain -> "L3" resource?
> 
> > 
> > Change all places where monitoring functions walk the list of
> > domains to use the new "mondomains" list instead of the old
> > "domains" list.
> 
> I would not refer to it as "the old domains list" as it creates
> impression that this is being replaced. The changelog makes
> no mention that domains list will remain and be dedicated to
> control domains. I think this is important to include in description
> of this change.

I've rewritten the entire commit message incorporating your suggestions.
V6 will be posted soon (after I get some time on an SNC SPR to check
that it all works!)

> 
> > 
> > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> > ---
> >  include/linux/resctrl.h                   |  10 +-
> >  arch/x86/kernel/cpu/resctrl/internal.h    |   2 +-
> >  arch/x86/kernel/cpu/resctrl/core.c        | 195 +++++++++++++++-------
> >  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |   2 +-
> >  arch/x86/kernel/cpu/resctrl/monitor.c     |   2 +-
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  30 ++--
> >  6 files changed, 167 insertions(+), 74 deletions(-)
> > 
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> > index 8334eeacfec5..1267d56f9e76 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -151,9 +151,11 @@ struct resctrl_schema;
> >   * @mon_capable:	Is monitor feature available on this machine
> >   * @num_rmid:		Number of RMIDs available
> >   * @cache_level:	Which cache level defines scope of this resource
> > + * @mon_scope:		Scope of this resource if different from cache_level
> 
> I think this addition should be deferred. As it is here it the "if different
> from cache_level" also creates many questions (when will it be different?
> how will it be determined that the scope is different in order to know that
> mon_scope should be used?)

I've gone in a different direction. V6 renames "cache_level" to
"ctrl_scope". I think this makes intent clear from step #1.

> 
> Looking ahead on how mon_scope is used there does not seem to be an "if"
> involved at all ... mon_scope is always the monitoring scope. 

Dropped the "if different" comment. You are correct that this is
unconditionally the monitor scope.

> 
> >   * @cache:		Cache allocation related data
> >   * @membw:		If the component has bandwidth controls, their properties.
> >   * @domains:		All domains for this resource
> 
> A change to the domains comment would also help - to highlight that it is
> now dedicated to control domains.

Done.

> 
> > + * @mondomains:		Monitor domains for this resource
> >   * @name:		Name to use in "schemata" file.
> >   * @data_width:		Character width of data when displaying
> >   * @default_ctrl:	Specifies default cache cbm or memory B/W percent.
> > @@ -169,9 +171,11 @@ struct rdt_resource {
> >  	bool			mon_capable;
> >  	int			num_rmid;
> >  	int			cache_level;
> > +	int			mon_scope;
> >  	struct resctrl_cache	cache;
> >  	struct resctrl_membw	membw;
> >  	struct list_head	domains;
> > +	struct list_head	mondomains;
> >  	char			*name;
> >  	int			data_width;
> >  	u32			default_ctrl;
> 
> ...
> 
> > @@ -384,14 +386,15 @@ void rdt_ctrl_update(void *arg)
> >  }
> >  
> >  /*
> > - * rdt_find_domain - Find a domain in a resource that matches input resource id
> > + * rdt_find_domain - Find a domain in one of the lists for a resource that
> > + * matches input resource id
> >   *
> 
> This change makes the function more vague. I think original summary is
> still accurate, how the list is used can be describe in the details below.
> I see more changes to this function is upcoming and I will comment more 
> at those sites.

Re-worked this based on your other suggestions to have separate find*
functions for ctrl and mon cases calling to common __rdt_find_domain().

> 
> >   * Search resource r's domain list to find the resource id. If the resource
> >   * id is found in a domain, return the domain. Otherwise, if requested by
> >   * caller, return the first domain whose id is bigger than the input id.
> >   * The domain list is sorted by id in ascending order.
> >   */
> > -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> > +struct rdt_domain *rdt_find_domain(struct list_head *h, int id,
> >  				   struct list_head **pos)
> >  {
> >  	struct rdt_domain *d;
> > @@ -400,7 +403,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> >  	if (id < 0)
> >  		return ERR_PTR(-ENODEV);
> >  
> > -	list_for_each(l, &r->domains) {
> > +	list_for_each(l, h) {
> >  		d = list_entry(l, struct rdt_domain, list);
> >  		/* When id is found, return its domain. */
> >  		if (id == d->id)
> > @@ -487,6 +490,94 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
> >  	return 0;
> >  }
> >  
> > +static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
> > +{
> > +	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> > +	struct list_head *add_pos = NULL;
> > +	struct rdt_hw_domain *hw_dom;
> > +	struct rdt_domain *d;
> > +	int err;
> > +
> > +	d = rdt_find_domain(&r->domains, id, &add_pos);
> > +	if (IS_ERR(d)) {
> > +		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> > +		return;
> > +	}
> > +
> > +	if (d) {
> > +		cpumask_set_cpu(cpu, &d->cpu_mask);
> > +		if (r->cache.arch_has_per_cpu_cfg)
> > +			rdt_domain_reconfigure_cdp(r);
> > +		return;
> > +	}
> > +
> > +	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> > +	if (!hw_dom)
> > +		return;
> > +
> > +	d = &hw_dom->d_resctrl;
> > +	d->id = id;
> > +	cpumask_set_cpu(cpu, &d->cpu_mask);
> > +
> > +	rdt_domain_reconfigure_cdp(r);
> > +
> > +	if (domain_setup_ctrlval(r, d)) {
> > +		domain_free(hw_dom);
> > +		return;
> > +	}
> > +
> > +	list_add_tail(&d->list, add_pos);
> > +
> > +	err = resctrl_online_ctrl_domain(r, d);
> > +	if (err) {
> > +		list_del(&d->list);
> > +		domain_free(hw_dom);
> > +	}
> > +}
> > +
> > +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> > +{
> > +	int id = get_cpu_cacheinfo_id(cpu, r->mon_scope);
> 
> Using a different scope variable but continuing to treat it
> as a cache level creates unnecessary confusion at this point.
> 
> > +	struct list_head *add_pos = NULL;
> > +	struct rdt_hw_domain *hw_dom;
> > +	struct rdt_domain *d;
> > +	int err;
> > +
> > +	d = rdt_find_domain(&r->mondomains, id, &add_pos);
> > +	if (IS_ERR(d)) {
> > +		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> 
> Note for future change ... this continues to refer to monitor scope as
> a cache id. I did not see this changed in the later patch that actually
> changes how scope is used.

Changed all these messages to refer to scope instead of cache id.

> 
> > +		return;
> > +	}
> > +
> > +	if (d) {
> > +		cpumask_set_cpu(cpu, &d->cpu_mask);
> > +		if (r->cache.arch_has_per_cpu_cfg)
> > +			rdt_domain_reconfigure_cdp(r);
> 
> Copy & paste error?

Indeed yes. This code isn't appropriate for the monitor case.
Deleted.

> 
> > +		return;
> > +	}
> > +
> > +	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> > +	if (!hw_dom)
> > +		return;
> > +
> > +	d = &hw_dom->d_resctrl;
> > +	d->id = id;
> > +	cpumask_set_cpu(cpu, &d->cpu_mask);
> > +
> > +	if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> > +		domain_free(hw_dom);
> > +		return;
> > +	}
> > +
> > +	list_add_tail(&d->list, add_pos);
> > +
> > +	err = resctrl_online_mon_domain(r, d);
> > +	if (err) {
> > +		list_del(&d->list);
> > +		domain_free(hw_dom);
> > +	}
> > +}
> > +
> >  /*
> >   * domain_add_cpu - Add a cpu to a resource's domain list.
> >   *
> > @@ -502,61 +593,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
> >   */
> >  static void domain_add_cpu(int cpu, struct rdt_resource *r)
> >  {
> > -	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> > -	struct list_head *add_pos = NULL;
> > -	struct rdt_hw_domain *hw_dom;
> > -	struct rdt_domain *d;
> > -	int err;
> > -
> > -	d = rdt_find_domain(r, id, &add_pos);
> > -	if (IS_ERR(d)) {
> > -		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> > -		return;
> > -	}
> > -
> > -	if (d) {
> > -		cpumask_set_cpu(cpu, &d->cpu_mask);
> > -		if (r->cache.arch_has_per_cpu_cfg)
> > -			rdt_domain_reconfigure_cdp(r);
> > -		return;
> > -	}
> > -
> > -	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> > -	if (!hw_dom)
> > -		return;
> > -
> > -	d = &hw_dom->d_resctrl;
> > -	d->id = id;
> > -	cpumask_set_cpu(cpu, &d->cpu_mask);
> > -
> > -	rdt_domain_reconfigure_cdp(r);
> > -
> > -	if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> > -		domain_free(hw_dom);
> > -		return;
> > -	}
> > -
> > -	if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> > -		domain_free(hw_dom);
> > -		return;
> > -	}
> > -
> > -	list_add_tail(&d->list, add_pos);
> > -
> > -	err = resctrl_online_domain(r, d);
> > -	if (err) {
> > -		list_del(&d->list);
> > -		domain_free(hw_dom);
> > -	}
> > +	if (r->alloc_capable)
> > +		domain_add_cpu_ctrl(cpu, r);
> > +	if (r->mon_capable)
> > +		domain_add_cpu_mon(cpu, r);
> >  }
> 
> A resource could be both alloc and mon capable ... both
> domain_add_cpu_ctrl() and domain_add_cpu_mon() can fail.
> Should domain_add_cpu_mon() still be run for a CPU if
> domain_add_cpu_ctrl() failed? 
> 
> Looking ahead the CPU should probably also not be added
> to the default groups mask if a failure occurred.

Existing code doesn't do anything for the case where a CPU
can't be added to a domain (probably the only real error case
is failure to allocate memory for the domain structure).

May be something to tackle in a future series if anyone
thinks this is a serious problem and has suggestions on
what to do. It seems like a catastrophic problem to not
have some CPUs in some/all domains of some resources.
Maybe this should disable mounting resctrl filesystem
completely?

> 
> > -static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> > +static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
> >  {
> >  	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> >  	struct rdt_hw_domain *hw_dom;
> >  	struct rdt_domain *d;
> >  
> > -	d = rdt_find_domain(r, id, NULL);
> > +	d = rdt_find_domain(&r->domains, id, NULL);
> >  	if (IS_ERR_OR_NULL(d)) {
> >  		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> >  		return;
> > @@ -565,7 +614,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> >  
> >  	cpumask_clear_cpu(cpu, &d->cpu_mask);
> >  	if (cpumask_empty(&d->cpu_mask)) {
> > -		resctrl_offline_domain(r, d);
> > +		resctrl_offline_ctrl_domain(r, d);
> >  		list_del(&d->list);
> >  
> >  		/*
> > @@ -578,6 +627,30 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> >  
> >  		return;
> >  	}
> > +}
> > +
> > +static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
> > +{
> > +	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> 
> Introducing mon_scope can really be deferred ... here the monitoring code
> is not using mon_scope anyway.

Not deferring. But I did fix this to use r->mon_scope. Good catch.

> 
> > +	struct rdt_hw_domain *hw_dom;
> > +	struct rdt_domain *d;
> > +
> > +	d = rdt_find_domain(&r->mondomains, id, NULL);
> > +	if (IS_ERR_OR_NULL(d)) {
> > +		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> > +		return;
> > +	}
> > +	hw_dom = resctrl_to_arch_dom(d);
> > +
> > +	cpumask_clear_cpu(cpu, &d->cpu_mask);
> > +	if (cpumask_empty(&d->cpu_mask)) {
> > +		resctrl_offline_mon_domain(r, d);
> > +		list_del(&d->list);
> > +
> > +		domain_free(hw_dom);
> > +
> > +		return;
> > +	}
> >  
> >  	if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
> >  		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
> 
> Reinette

-Tony