Re: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Tue, 18 Jul 2023 16:45:47 -0700

Hi Tony,

On 7/18/2023 3:57 PM, Tony Luck wrote:
> On Tue, Jul 18, 2023 at 01:40:32PM -0700, Reinette Chatre wrote:
>>> +	[RDT_RESOURCE_NODE] =
>>> +	{
>>> +		.r_resctrl = {
>>> +			.rid			= RDT_RESOURCE_NODE,
>>> +			.name			= "L3",
>>> +			.scope			= SCOPE_NODE,
>>> +			.domains		= domain_init(RDT_RESOURCE_NODE),
>>> +			.fflags			= 0,
>>> +		},
>>> +	},
>>>  };
>>
>> So the new resource has the same name, from user perspective,
>> as RDT_RESOURCE_L3. From this perspective it thus seems to be a
>> shadow of RDT_RESOURCE_L3 that is used as alternative for some properties
>> of the actual RDT_RESOURCE_L3? This is starting to look as though this
>> solution is wrenching itself into current architecture.
>>
>> >From what I can tell the monitoring in SNC environment needs a different
>> domain list because of the change in scope. What else is needed in the
>> resource that is different from the existing L3 resource? Could the
>> monitoring scope of a resource not instead be made distinct from its
>> allocation scope? By default monitoring and allocation scope will be
>> the same and thus use the same domain list but when SNC is enabled
>> then monitoring uses a different domain list.
> 
> Answering this part first, because my choice here affects a bunch
> of the code that also raised comments from you.

Indeed.

> 
> The crux of the issue is that when SNC mode is enabled the scope
> for L3 monitoring functions changes to "node" scope, while the
> scope of L3 control functions (CAT, CDP) remains at L3 cache scope.
> 
> My solution was to just create a new resource. But you have an
> interesing alternate solution. Add an extra domain list to the
> resource structure to allow creation of distinct domain lists
> for this case where the scope for control and monitor functions
> differs.
> 
> So change the resource structure like this:
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 8334eeacfec5..01590aa59a67 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -168,10 +168,12 @@ struct rdt_resource {
>  	bool			alloc_capable;
>  	bool			mon_capable;
>  	int			num_rmid;
> -	int			cache_level;
> +	int			ctrl_scope;
> +	int			mon_scope;

I am not sure about getting rid of cache_level so fast.
I see regarding the current problem being solved that
ctrl_scope would have the same values as cache_level but
I find that adding this level of indirection while keeping
the comparison with cacheinfo->level to create a trap
for future mistakes.

>  	struct resctrl_cache	cache;
>  	struct resctrl_membw	membw;
> -	struct list_head	domains;
> +	struct list_head	ctrl_domains;
> +	struct list_head	mon_domains;
>  	char			*name;
>  	int			data_width;
>  	u32			default_ctrl;
> 
> and build/use separate domain lists for when this resource is
> being referenced for allocation/monitoring. E.g. domain_add_cpu()
> would check "r->alloc_capable" and add a cpu to the ctrl_domains
> list based on the ctrl_scope value. It would do the same with
> mon_capable / mon_domains / mon_scope.
> 
> If ctrl_scope == mon_scope, just build one list as you suggest above.

Yes, this is the idea. Thank you for considering it. Something else
to consider that may make this even cleaner/simpler would be to review
struct rdt_domain and struct rdt_hw_domain members for "monitor" vs "control"
usage. These structs could potentially be split further into separate
"control" and "monitor" variants. For example, "struct rdt_domain" split into
"struct rdt_ctrl_domain" and "struct rdt_mon_domain". If there is a clean
split then resctrl can always create two lists with the unnecessary duplication
eliminated when two domain lists are created. This would also
eliminate the need to scatter ctrl_scope == mon_scope checks throughout. 

> 
> Maybe there are more places that walk the list of control domains than
> walk the list of monitor domains. Need to audit this set:
> 
> $ git grep list_for_each.*domains -- arch/x86/kernel/cpu/resctrl
> arch/x86/kernel/cpu/resctrl/core.c:     list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/core.c:     list_for_each(l, &r->domains) {
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c:      list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c:      list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c:      list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/monitor.c:  list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c:              list_for_each_entry(d_i, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c:         list_for_each_entry(dom, &r->domains, list)
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c:         list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c:         list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r_l->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c:         list_for_each_entry(dom, &r->domains, list)
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &s->res->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> 
> Maybe "domains" can keep its name and make a "list_for_each_monitor_domain()" macro
> to pick the right list to walk?

It is not clear to me how "domains" can keep its name. If I understand
the macro would be useful if scope always needs to be considered. I wonder
if the list walkers may not mostly just walk the appropriate list directly
if resctrl always creates separate "control domain" and "monitor domain"
lists.

> I don't think this will reduce the amount of code change in a
> significant way. But it may be conceptually easier to follow
> what is going on.

Reducing the amount of code changed is not a goal to me. If I understand
correctly I think that adapting resctrl to support different monitor and
control scope could create a foundation into which SNC can slot in smoothly. 

Reinette