Re: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Thu, 20 Jul 2023 10:27:46 -0700

Hi Tony,

On 7/19/2023 5:20 PM, Tony Luck wrote:
> Here's a quick hack to see how things might look with
> separate domain lists in the "L3" resource.
> 
> For testing purposes on a non-SNC system I set ->mon_scope =
> MON_SCOPE_NODE, but made domain_add_cpu() allocate the mondomains
> list based on L3 scope ... just so I could check that I found all
> the places where monitoring needs to use the mondomains list.
> The kernel doesn't crash when running tools/testing/selftests/resctrl,
> and the tests all pass. But that doesn't mean I didn't miss something.
> 
> Some restructuring of control vs. monitoing initialization might
> avoid some of the code I duplicated in domain_add_cpu(). But this
> is intended just as a "Is this what you meant?" before I dig deeper.

Thank you for considering the approach. I find that this sample move
towards the idea while also highlighting what else can be considered.
I do not know if you already considered these ideas and found it flawed
so I will try to make it explicit so that you can point out to me where
things will fall apart.

The sample code introduces a new list "mondomains" that is intended to
be used when the monitoring scope is different from the allocation scope.
This introduces duplication when the monitoring and allocation scope is
different. Each list, "domains" and "mondomains" will host structures
that can accommodate both monitoring and allocation data, with the data
not relevant to the list going unused as it is unnecessarily duplicated.
Additionally this forces significant portions of resctrl to now always
consider whether the monitoring and allocation scope is different ...
note how this sample now has code like below scattered throughout.
   h = r->mon_scope ? &r->mondomains : &r->domains;

I also find the domain_add_cpu() becoming intricate as it needs to
navigate all the different scenarios.

This unnecessary duplication, new environment checks needed throughout,
and additional code complexities are red flags to me that this solution
is not well integrated.

To deal with these complexities I would like to consider if it may
make things simpler to always (irrespective of allocation and
monitoring scope) maintain allocation and monitoring domain lists.
Each list need only carry data appropriate to its use ... the allocation
list only has data relevant to allocation, the monitoring list only
has data relevant to monitoring. This is the struct rdt_domain related
split I mentioned previously.

Code could become something like:

resctrl_online_cpu()
{
	...
	for_each_alloc_capable_rdt_resource(r)
		alloc_domain_add_cpu(...)
	for_each_mon_capable_rdt_resource(r)
		mon_domain_add_cpu(...)
	...
}

This would reduce complication in domain_add_cpu() since each domain list
only need to concern itself with monitoring or allocation.

Even resctrl_online_domain() can be siplified significantly by
making it specific to allocation or monitoring. For example,
resctrl_online_mon_domain() would only and always just run
the monitoring related code.

With the separate allocation and monitoring domain lists there
may no longer be a need for scattering code with checks like:
   h = r->mon_scope ? &r->mondomains : &r->domains;
This would be because the code can directly pick the domain
list it is operating on.

What do you think? The above is just refactoring of existing
code and from what I can tell this would make supporting
SNC straight forward.

Reinette