On 8/28/2023 1:59 PM, Tony Luck wrote: > On Mon, Aug 28, 2023 at 12:56:27PM -0700, Reinette Chatre wrote: >> Hi Tony, >> >> On 8/28/2023 11:46 AM, Tony Luck wrote: >>> On Mon, Aug 28, 2023 at 10:05:31AM -0700, Reinette Chatre wrote: >>>> Hi Tony, >>>> >>>> On 8/25/2023 10:05 AM, Tony Luck wrote: >>>>> On Fri, Aug 11, 2023 at 10:29:25AM -0700, Reinette Chatre wrote: >>>>>> On 7/22/2023 12:07 PM, Tony Luck wrote: >>>> >>>>>>> Change all places where monitoring functions walk the list of >>>>>>> domains to use the new "mondomains" list instead of the old >>>>>>> "domains" list. >>>>>> >>>>>> I would not refer to it as "the old domains list" as it creates >>>>>> impression that this is being replaced. The changelog makes >>>>>> no mention that domains list will remain and be dedicated to >>>>>> control domains. I think this is important to include in description >>>>>> of this change. >>>>> >>>>> I've rewritten the entire commit message incorporating your suggestions. >>>>> V6 will be posted soon (after I get some time on an SNC SPR to check >>>>> that it all works!) >>>> >>>> I seem to have missed v5. >>> >>> I simply can't count. You are correct that next version to be posted >>> will be v5. >>> >>>>> >>>>>> >>>>>>> >>>>>>> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> >>>>>>> --- >>>>>>> include/linux/resctrl.h | 10 +- >>>>>>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +- >>>>>>> arch/x86/kernel/cpu/resctrl/core.c | 195 +++++++++++++++------- >>>>>>> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +- >>>>>>> arch/x86/kernel/cpu/resctrl/monitor.c | 2 +- >>>>>>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 30 ++-- >>>>>>> 6 files changed, 167 insertions(+), 74 deletions(-) >>>>>>> >>>>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h >>>>>>> index 8334eeacfec5..1267d56f9e76 100644 >>>>>>> --- a/include/linux/resctrl.h >>>>>>> +++ b/include/linux/resctrl.h >>>>>>> @@ -151,9 +151,11 @@ struct resctrl_schema; >>>>>>> * @mon_capable: Is monitor feature available on this machine >>>>>>> * @num_rmid: Number of RMIDs available >>>>>>> * @cache_level: Which cache level defines scope of this resource >>>>>>> + * @mon_scope: Scope of this resource if different from cache_level >>>>>> >>>>>> I think this addition should be deferred. As it is here it the "if different >>>>>> from cache_level" also creates many questions (when will it be different? >>>>>> how will it be determined that the scope is different in order to know that >>>>>> mon_scope should be used?) >>>>> >>>>> I've gone in a different direction. V6 renames "cache_level" to >>>>> "ctrl_scope". I think this makes intent clear from step #1. >>>>> >>>> >>>> This change is not clear to me. Previously you changed this name >>>> but kept using it in code specific to cache levels. It is not clear >>>> to me how this time's rename would be different. >>> >>> The current "cache_level" field in the structure is describing >>> the scope of each instance using the cache level (2 or 3) as the >>> method to describe which CPUs are considered part of a group. Currently >>> the scope is the same for both control and monitor resources. >> >> Right. >> >>> >>> Would you like to see patches in this progrssion: >>> >>> 1) Rename "cache_level" to "scope". With commit comment that future >>> patches are going to base the scope on NUMA nodes in addtion to sharing >>> caches at particular levels, and will split into separate control and >>> monitor scope. >>> >>> 2) Split the "scope" field from first patch into "ctrl_scope" and >>> "mon_scope" (also with the addition of the new list for the mon_scope). >>> >>> 3) Add "node" as a new option for scope in addtion to L3 and L2 cache. >>> >> >> hmmm - my comment cannot be addressed through patch re-ordering. >> If I understand correctly you plan to change the name of "cache_level" >> to "ctrl_scope". My comment is that this obfuscates the code as long as >> you use this variable to compare against data that can only represent cache >> levels. This just repeats what I wrote in >> https://lore.kernel.org/lkml/09847c37-66d7-c286-a313-308eaa338c64@xxxxxxxxx/ > > I'm proposing more than just re-ordering. The above sequence is a > couple of extra patches in the series. > > Existing state of code: > > There is a single field named "cache_level" that describes how > CPUs are assigned to domains based on their sharing of a cache > at a particular level. Hard-coded values of "2" and "3" are used > to describe the level. This is just a scope description of which > CPUs are grouped together. But it is limited to just doing so > based on which caches are shared by those CPUs. > > Step 1: > > Change the name of the field s/cache_level/scope/. Provide an > enum with values RESCTRL_L3_CACHE, RESCTRL_L2_CACHE aand use > those througout code instead of 3, 2, and implictly passing > the resctrl scope to functions like get_cpu_cacheinfo_id() > > Add get_domain_id_from_scope() function that takes the enum > values for scope and converts to "3", "2" to pass to > get_cpu_cacheinfo_id(). > > No functional change. Just broadening the meaning of the field > so that it can in future patches to describe scopes that > aren't a function of sharing a cache of a particular level. Right. So if I understand you are planning the same change as you did in V3, like below, and my original comment still stands. @@ -1348,7 +1348,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, num_b = bitmap_weight(&cbm, r->cache.cbm_len); ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask)); for (i = 0; i < ci->num_leaves; i++) { - if (ci->info_list[i].level == r->cache_level) { + if (ci->info_list[i].level == r->scope) { size = ci->info_list[i].size / r->cache.cbm_len * num_b; break; } >>>>>>> @@ -502,61 +593,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom) >>>>>>> */ >>>>>>> static void domain_add_cpu(int cpu, struct rdt_resource *r) >>>>>>> { >>>>>>> - int id = get_cpu_cacheinfo_id(cpu, r->cache_level); >>>>>>> - struct list_head *add_pos = NULL; >>>>>>> - struct rdt_hw_domain *hw_dom; >>>>>>> - struct rdt_domain *d; >>>>>>> - int err; >>>>>>> - >>>>>>> - d = rdt_find_domain(r, id, &add_pos); >>>>>>> - if (IS_ERR(d)) { >>>>>>> - pr_warn("Couldn't find cache id for CPU %d\n", cpu); >>>>>>> - return; >>>>>>> - } >>>>>>> - >>>>>>> - if (d) { >>>>>>> - cpumask_set_cpu(cpu, &d->cpu_mask); >>>>>>> - if (r->cache.arch_has_per_cpu_cfg) >>>>>>> - rdt_domain_reconfigure_cdp(r); >>>>>>> - return; >>>>>>> - } >>>>>>> - >>>>>>> - hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); >>>>>>> - if (!hw_dom) >>>>>>> - return; >>>>>>> - >>>>>>> - d = &hw_dom->d_resctrl; >>>>>>> - d->id = id; >>>>>>> - cpumask_set_cpu(cpu, &d->cpu_mask); >>>>>>> - >>>>>>> - rdt_domain_reconfigure_cdp(r); >>>>>>> - >>>>>>> - if (r->alloc_capable && domain_setup_ctrlval(r, d)) { >>>>>>> - domain_free(hw_dom); >>>>>>> - return; >>>>>>> - } >>>>>>> - >>>>>>> - if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { >>>>>>> - domain_free(hw_dom); >>>>>>> - return; >>>>>>> - } >>>>>>> - >>>>>>> - list_add_tail(&d->list, add_pos); >>>>>>> - >>>>>>> - err = resctrl_online_domain(r, d); >>>>>>> - if (err) { >>>>>>> - list_del(&d->list); >>>>>>> - domain_free(hw_dom); >>>>>>> - } >>>>>>> + if (r->alloc_capable) >>>>>>> + domain_add_cpu_ctrl(cpu, r); >>>>>>> + if (r->mon_capable) >>>>>>> + domain_add_cpu_mon(cpu, r); >>>>>>> } >>>>>> >>>>>> A resource could be both alloc and mon capable ... both >>>>>> domain_add_cpu_ctrl() and domain_add_cpu_mon() can fail. >>>>>> Should domain_add_cpu_mon() still be run for a CPU if >>>>>> domain_add_cpu_ctrl() failed? >>>>>> >>>>>> Looking ahead the CPU should probably also not be added >>>>>> to the default groups mask if a failure occurred. >>>>> >>>>> Existing code doesn't do anything for the case where a CPU >>>>> can't be added to a domain (probably the only real error case >>>>> is failure to allocate memory for the domain structure). >>>> >>>> Is my statement about CPU being added to default group mask >>>> incorrect? Seems like a potential issue related to domain's >>>> CPU mask also. >>>> >>>> Please see my earlier question. Existing code does not proceed >>>> with monitor initialization if control initialization fails and >>>> undoes control initialization if monitor initialization fails. >>> >>> Existing code silently continues if a domain structure cannot >>> be allocated to add a CPU to a domain: >>> >>> 503 static void domain_add_cpu(int cpu, struct rdt_resource *r) >>> 504 { >>> 505 int id = get_cpu_cacheinfo_id(cpu, r->cache_level); >>> 506 struct list_head *add_pos = NULL; >>> 507 struct rdt_hw_domain *hw_dom; >>> 508 struct rdt_domain *d; >>> 509 int err; >>> >>> ... >>> >>> 523 >>> 524 hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); >>> 525 if (!hw_dom) >>> 526 return; >>> 527 >>> >> >> >> Right ... and if it returns silently as above it runs: >> >> static int resctrl_online_cpu(unsigned int cpu) >> { >> >> >> for_each_capable_rdt_resource(r) >> domain_add_cpu(cpu, r); >> >>>>> cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask); <<<<<<<< >> >> } >> >> Also, note within domain_add_cpu(): >> >> static void domain_add_cpu(int cpu, struct rdt_resource *r) >> { >> >> >> ... >> if (r->alloc_capable && domain_setup_ctrlval(r, d)) { >> domain_free(hw_dom); >> return; >> } >> >> if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { >> domain_free(hw_dom); >> return; >> } >> >> ... >> } >> >> The above is the other item that I've been trying to discuss >> with you. Note that existing resctrl will not initialize monitoring if >> control could not be initialized. >> Compare with this submission: >> >> if (r->alloc_capable) >> domain_add_cpu_ctrl(cpu, r); >> if (r->mon_capable) >> domain_add_cpu_mon(cpu, r); >> >> I'll stop trying Reinette