Re: [PATCH v12 7/8] x86/resctrl: Sub NUMA Cluster detection and enable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 30, 2023 at 03:40:52PM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 11/30/2023 2:43 PM, Tony Luck wrote:
> > On Thu, Nov 30, 2023 at 01:47:10PM -0800, Reinette Chatre wrote:
> 
> ...
> 
> >>>  	if (!x86_match_cpu(snc_cpu_ids))
> >>>  		return 1;
> >>
> >> I understand and welcome this change as motivated by robustness. Apart
> >> from that, with this being a model specific feature for this particular
> >> group of systems, it it not clear to me in which scenarios this could
> >> run on a system where a present CPU does not have access to L3 cache.
> > 
> > Agreed that on these systems there should always be an L3 cache. Should
> > I drop the check for "-1"?
> 
> Please do keep it. I welcome the additional robustness. The static checker I
> tried did not complain about this but I expect that it is something that
> could trigger checks.
> 
> > 
> >>>  
> >>> -	node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
> >>> +	node_caches = bitmap_zalloc(num_online_cpus(), GFP_KERNEL);
> >>
> >> Please do take care to take new bitmap size into account in all
> >> places. From what I can tell there is a later bitmap_weight() call that
> >> still uses nr_node_ids as size.
> > 
> > Oops. I was also using num_online_cpus() before cpus_read_lock(), so
> > things could theoretically change before the bitmap_weight() call.
> > I switched to using num_present_cpus() in both places.
> 
> Thanks for catching this. I am not sure if num_present_cpus() is the right
> choice. I found its comment to say "If HOTPLUG is enabled, then cpu_present_mask
> varies dynamically ...". num_possible_cpus() seems more appropriate when looking

I can size the bitmask based on num_possible_cpus().

> for something that does not change while not holding the hotplug lock. Reading its
> description more closely also makes me wonder if the later
> 	num_online_cpus() != num_present_cpus()
> should also maybe be 
> 	num_online_cpus() != num_possible_cpus() ?
> It seems to more closely match the intention.

This seems problematic. On a system that does support physical CPU
hotplug num_possible_cpus() may be some very large number. Reserving
space for CPUs that can be added later. None of those CPUs can be online
(obviously!). So this test would fail on such a system.

> >>>  	if (!node_caches)
> >>>  		return 1;
> >>>  
> >>> @@ -1072,10 +1073,13 @@ static __init int snc_get_config(void)
> >>>  
> >>>  	for_each_node(node) {
> >>>  		cpu = cpumask_first(cpumask_of_node(node));
> >>> -		if (cpu < nr_cpu_ids)
> >>> -			set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
> >>> -		else
> >>> +		if (cpu < nr_cpu_ids) {
> >>> +			cache_id = get_cpu_cacheinfo_id(cpu, 3);
> >>> +			if (cache_id != -1)
> >>> +				set_bit(cache_id, node_caches);
> >>> +		} else {
> >>>  			mem_only_nodes++;
> >>> +		}
> >>>  	}
> >>>  	cpus_read_unlock();
> >>>  
> >>
> >> Could this code be made even more robust by checking the computed
> >> snc_nodes_per_l3_cache against the limited actually possible values?
> >> Forcing it to 1 if something went wrong?
> > 
> > Added a couple of extra sanity checks. See updated incremental patch
> > below.
> 
> Thank you very much. The additional checks look good to me.
> 
> Reinette

Thanks for looking at this. I'm applying changes to my local tree. I'll
give folks a little more time to find additonal issues in v12 and post
v13 next week.

-Tony




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux