On Mon, Aug 28, 2023 at 02:20:04PM -0700, Reinette Chatre wrote: > > > On 8/28/2023 1:59 PM, Tony Luck wrote: > > On Mon, Aug 28, 2023 at 12:56:27PM -0700, Reinette Chatre wrote: > >> Hi Tony, > >> > >> On 8/28/2023 11:46 AM, Tony Luck wrote: > >>> On Mon, Aug 28, 2023 at 10:05:31AM -0700, Reinette Chatre wrote: > >>>> Hi Tony, > >>>> > >>>> On 8/25/2023 10:05 AM, Tony Luck wrote: > >>>>> On Fri, Aug 11, 2023 at 10:29:25AM -0700, Reinette Chatre wrote: > >>>>>> On 7/22/2023 12:07 PM, Tony Luck wrote: > >>>> > >>>>>>> Change all places where monitoring functions walk the list of > >>>>>>> domains to use the new "mondomains" list instead of the old > >>>>>>> "domains" list. > >>>>>> > >>>>>> I would not refer to it as "the old domains list" as it creates > >>>>>> impression that this is being replaced. The changelog makes > >>>>>> no mention that domains list will remain and be dedicated to > >>>>>> control domains. I think this is important to include in description > >>>>>> of this change. > >>>>> > >>>>> I've rewritten the entire commit message incorporating your suggestions. > >>>>> V6 will be posted soon (after I get some time on an SNC SPR to check > >>>>> that it all works!) > >>>> > >>>> I seem to have missed v5. > >>> > >>> I simply can't count. You are correct that next version to be posted > >>> will be v5. > >>> > >>>>> > >>>>>> > >>>>>>> > >>>>>>> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> > >>>>>>> --- > >>>>>>> include/linux/resctrl.h | 10 +- > >>>>>>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +- > >>>>>>> arch/x86/kernel/cpu/resctrl/core.c | 195 +++++++++++++++------- > >>>>>>> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +- > >>>>>>> arch/x86/kernel/cpu/resctrl/monitor.c | 2 +- > >>>>>>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 30 ++-- > >>>>>>> 6 files changed, 167 insertions(+), 74 deletions(-) > >>>>>>> > >>>>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h > >>>>>>> index 8334eeacfec5..1267d56f9e76 100644 > >>>>>>> --- a/include/linux/resctrl.h > >>>>>>> +++ b/include/linux/resctrl.h > >>>>>>> @@ -151,9 +151,11 @@ struct resctrl_schema; > >>>>>>> * @mon_capable: Is monitor feature available on this machine > >>>>>>> * @num_rmid: Number of RMIDs available > >>>>>>> * @cache_level: Which cache level defines scope of this resource > >>>>>>> + * @mon_scope: Scope of this resource if different from cache_level > >>>>>> > >>>>>> I think this addition should be deferred. As it is here it the "if different > >>>>>> from cache_level" also creates many questions (when will it be different? > >>>>>> how will it be determined that the scope is different in order to know that > >>>>>> mon_scope should be used?) > >>>>> > >>>>> I've gone in a different direction. V6 renames "cache_level" to > >>>>> "ctrl_scope". I think this makes intent clear from step #1. > >>>>> > >>>> > >>>> This change is not clear to me. Previously you changed this name > >>>> but kept using it in code specific to cache levels. It is not clear > >>>> to me how this time's rename would be different. > >>> > >>> The current "cache_level" field in the structure is describing > >>> the scope of each instance using the cache level (2 or 3) as the > >>> method to describe which CPUs are considered part of a group. Currently > >>> the scope is the same for both control and monitor resources. > >> > >> Right. > >> > >>> > >>> Would you like to see patches in this progrssion: > >>> > >>> 1) Rename "cache_level" to "scope". With commit comment that future > >>> patches are going to base the scope on NUMA nodes in addtion to sharing > >>> caches at particular levels, and will split into separate control and > >>> monitor scope. > >>> > >>> 2) Split the "scope" field from first patch into "ctrl_scope" and > >>> "mon_scope" (also with the addition of the new list for the mon_scope). > >>> > >>> 3) Add "node" as a new option for scope in addtion to L3 and L2 cache. > >>> > >> > >> hmmm - my comment cannot be addressed through patch re-ordering. > >> If I understand correctly you plan to change the name of "cache_level" > >> to "ctrl_scope". My comment is that this obfuscates the code as long as > >> you use this variable to compare against data that can only represent cache > >> levels. This just repeats what I wrote in > >> https://lore.kernel.org/lkml/09847c37-66d7-c286-a313-308eaa338c64@xxxxxxxxx/ > > > > I'm proposing more than just re-ordering. The above sequence is a > > couple of extra patches in the series. > > > > Existing state of code: > > > > There is a single field named "cache_level" that describes how > > CPUs are assigned to domains based on their sharing of a cache > > at a particular level. Hard-coded values of "2" and "3" are used > > to describe the level. This is just a scope description of which > > CPUs are grouped together. But it is limited to just doing so > > based on which caches are shared by those CPUs. > > > > Step 1: > > > > Change the name of the field s/cache_level/scope/. Provide an > > enum with values RESCTRL_L3_CACHE, RESCTRL_L2_CACHE aand use > > those througout code instead of 3, 2, and implictly passing > > the resctrl scope to functions like get_cpu_cacheinfo_id() > > > > Add get_domain_id_from_scope() function that takes the enum > > values for scope and converts to "3", "2" to pass to > > get_cpu_cacheinfo_id(). > > > > No functional change. Just broadening the meaning of the field > > so that it can in future patches to describe scopes that > > aren't a function of sharing a cache of a particular level. > > Right. So if I understand you are planning the same change as you did > in V3, like below, and my original comment still stands. > > @@ -1348,7 +1348,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, > num_b = bitmap_weight(&cbm, r->cache.cbm_len); > ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask)); > for (i = 0; i < ci->num_leaves; i++) { > - if (ci->info_list[i].level == r->cache_level) { > + if (ci->info_list[i].level == r->scope) { > size = ci->info_list[i].size / r->cache.cbm_len * num_b; > break; No. I heard your complaint about mixing "scope" and "cache_level". Latest version of that code looks like this: 1341 unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, 1342 struct rdt_domain *d, unsigned long cbm) 1343 { 1344 struct cpu_cacheinfo *ci; 1345 unsigned int size = 0; 1346 int cache_level; 1347 int num_b, i; 1348 1349 switch (r->ctrl_scope) { 1350 case RESCTRL_L3_CACHE: 1351 cache_level = 3; 1352 break; 1353 case RESCTRL_L2_CACHE: 1354 cache_level = 2; 1355 break; 1356 default: 1357 WARN_ON_ONCE(1); 1358 return size; 1359 } 1360 1361 num_b = bitmap_weight(&cbm, r->cache.cbm_len); 1362 ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask)); 1363 for (i = 0; i < ci->num_leaves; i++) { 1364 if (ci->info_list[i].level == cache_level) { 1365 size = ci->info_list[i].size / r->cache.cbm_len * num_b; 1366 break; 1367 } 1368 } 1369 1370 return size / snc_nodes_per_l3_cache; 1371 } > } > > > >>>>>>> @@ -502,61 +593,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom) > >>>>>>> */ > >>>>>>> static void domain_add_cpu(int cpu, struct rdt_resource *r) > >>>>>>> { > >>>>>>> - int id = get_cpu_cacheinfo_id(cpu, r->cache_level); > >>>>>>> - struct list_head *add_pos = NULL; > >>>>>>> - struct rdt_hw_domain *hw_dom; > >>>>>>> - struct rdt_domain *d; > >>>>>>> - int err; > >>>>>>> - > >>>>>>> - d = rdt_find_domain(r, id, &add_pos); > >>>>>>> - if (IS_ERR(d)) { > >>>>>>> - pr_warn("Couldn't find cache id for CPU %d\n", cpu); > >>>>>>> - return; > >>>>>>> - } > >>>>>>> - > >>>>>>> - if (d) { > >>>>>>> - cpumask_set_cpu(cpu, &d->cpu_mask); > >>>>>>> - if (r->cache.arch_has_per_cpu_cfg) > >>>>>>> - rdt_domain_reconfigure_cdp(r); > >>>>>>> - return; > >>>>>>> - } > >>>>>>> - > >>>>>>> - hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); > >>>>>>> - if (!hw_dom) > >>>>>>> - return; > >>>>>>> - > >>>>>>> - d = &hw_dom->d_resctrl; > >>>>>>> - d->id = id; > >>>>>>> - cpumask_set_cpu(cpu, &d->cpu_mask); > >>>>>>> - > >>>>>>> - rdt_domain_reconfigure_cdp(r); > >>>>>>> - > >>>>>>> - if (r->alloc_capable && domain_setup_ctrlval(r, d)) { > >>>>>>> - domain_free(hw_dom); > >>>>>>> - return; > >>>>>>> - } > >>>>>>> - > >>>>>>> - if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { > >>>>>>> - domain_free(hw_dom); > >>>>>>> - return; > >>>>>>> - } > >>>>>>> - > >>>>>>> - list_add_tail(&d->list, add_pos); > >>>>>>> - > >>>>>>> - err = resctrl_online_domain(r, d); > >>>>>>> - if (err) { > >>>>>>> - list_del(&d->list); > >>>>>>> - domain_free(hw_dom); > >>>>>>> - } > >>>>>>> + if (r->alloc_capable) > >>>>>>> + domain_add_cpu_ctrl(cpu, r); > >>>>>>> + if (r->mon_capable) > >>>>>>> + domain_add_cpu_mon(cpu, r); > >>>>>>> } > >>>>>> > >>>>>> A resource could be both alloc and mon capable ... both > >>>>>> domain_add_cpu_ctrl() and domain_add_cpu_mon() can fail. > >>>>>> Should domain_add_cpu_mon() still be run for a CPU if > >>>>>> domain_add_cpu_ctrl() failed? > >>>>>> > >>>>>> Looking ahead the CPU should probably also not be added > >>>>>> to the default groups mask if a failure occurred. > >>>>> > >>>>> Existing code doesn't do anything for the case where a CPU > >>>>> can't be added to a domain (probably the only real error case > >>>>> is failure to allocate memory for the domain structure). > >>>> > >>>> Is my statement about CPU being added to default group mask > >>>> incorrect? Seems like a potential issue related to domain's > >>>> CPU mask also. > >>>> > >>>> Please see my earlier question. Existing code does not proceed > >>>> with monitor initialization if control initialization fails and > >>>> undoes control initialization if monitor initialization fails. > >>> > >>> Existing code silently continues if a domain structure cannot > >>> be allocated to add a CPU to a domain: > >>> > >>> 503 static void domain_add_cpu(int cpu, struct rdt_resource *r) > >>> 504 { > >>> 505 int id = get_cpu_cacheinfo_id(cpu, r->cache_level); > >>> 506 struct list_head *add_pos = NULL; > >>> 507 struct rdt_hw_domain *hw_dom; > >>> 508 struct rdt_domain *d; > >>> 509 int err; > >>> > >>> ... > >>> > >>> 523 > >>> 524 hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); > >>> 525 if (!hw_dom) > >>> 526 return; > >>> 527 > >>> > >> > >> > >> Right ... and if it returns silently as above it runs: > >> > >> static int resctrl_online_cpu(unsigned int cpu) > >> { > >> > >> > >> for_each_capable_rdt_resource(r) > >> domain_add_cpu(cpu, r); > >> >>>>> cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask); <<<<<<<< > >> > >> } > >> > >> Also, note within domain_add_cpu(): > >> > >> static void domain_add_cpu(int cpu, struct rdt_resource *r) > >> { > >> > >> > >> ... > >> if (r->alloc_capable && domain_setup_ctrlval(r, d)) { > >> domain_free(hw_dom); > >> return; > >> } > >> > >> if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { > >> domain_free(hw_dom); > >> return; > >> } > >> > >> ... > >> } > >> > >> The above is the other item that I've been trying to discuss > >> with you. Note that existing resctrl will not initialize monitoring if > >> control could not be initialized. > >> Compare with this submission: > >> > >> if (r->alloc_capable) > >> domain_add_cpu_ctrl(cpu, r); > >> if (r->mon_capable) > >> domain_add_cpu_mon(cpu, r); > >> > >> > > I'll stop trying I'll see what I can do to improve the error handling. -Tony