Re: [PATCH v4 7/7] selftests/resctrl: Adjust effective L3 cache size when SNC enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 11, 2023 at 10:33:43AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 7/22/2023 12:07 PM, Tony Luck wrote:
> > Sub-NUMA Cluster divides CPUs sharing an L3 cache into separate NUMA
> > nodes. Systems may support splitting into either two or four nodes.
> > 
> > When SNC mode is enabled the effective amount of L3 cache available
> > for allocation is divided by the number of nodes per L3.
> > 
> > Detect which SNC mode is active by comparing the number of CPUs
> > that share a cache with CPU0, with the number of CPUs on node0.
> > 
> > Reported-by: "Shaopeng Tan (Fujitsu)" <tan.shaopeng@xxxxxxxxxxx>
> > Closes: https://lore.kernel.org/r/TYAPR01MB6330B9B17686EF426D2C3F308B25A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> 
> This does not seem to be the case when looking at
> https://lore.kernel.org/all/TYAPR01MB6330A4EB3633B791939EA45E8B39A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Correct. I'll drop the "Closes:" tag.  I'm not sure what
the status is. Shaopeng didn't respond to my suggestion
to try "taskset(1)" when running the tests to check if
NUMA effects are causing the test to fail.

> 
> > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> > ---
> >  tools/testing/selftests/resctrl/resctrl.h   |  1 +
> >  tools/testing/selftests/resctrl/resctrlfs.c | 57 +++++++++++++++++++++
> >  2 files changed, 58 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> > index 87e39456dee0..a8b43210b573 100644
> > --- a/tools/testing/selftests/resctrl/resctrl.h
> > +++ b/tools/testing/selftests/resctrl/resctrl.h
> > @@ -13,6 +13,7 @@
> >  #include <signal.h>
> >  #include <dirent.h>
> >  #include <stdbool.h>
> > +#include <ctype.h>
> >  #include <sys/stat.h>
> >  #include <sys/ioctl.h>
> >  #include <sys/mount.h>
> > diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
> > index fb00245dee92..79eecbf9f863 100644
> > --- a/tools/testing/selftests/resctrl/resctrlfs.c
> > +++ b/tools/testing/selftests/resctrl/resctrlfs.c
> > @@ -130,6 +130,61 @@ int get_resource_id(int cpu_no, int *resource_id)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * Count number of CPUs in a /sys bit map
> > + */
> > +static int count_sys_bitmap_bits(char *name)
> > +{
> > +	FILE *fp = fopen(name, "r");
> > +	int count = 0, c;
> > +
> > +	if (!fp)
> > +		return 0;
> > +
> > +	while ((c = fgetc(fp)) != EOF) {
> > +		if (!isxdigit(c))
> > +			continue;
> > +		switch (c) {
> > +		case 'f':
> > +			count++;
> > +		case '7': case 'b': case 'd': case 'e':
> > +			count++;
> > +		case '3': case '5': case '6': case '9': case 'a': case 'c':
> > +			count++;
> > +		case '1': case '2': case '4': case '8':
> > +			count++;
> > +		}
> > +	}
> > +	fclose(fp);
> > +
> > +	return count;
> > +}
> > +
> > +/*
> > + * Detect SNC by compating #CPUs in node0 with #CPUs sharing LLC with CPU0
> > + * Try to get this right, even if a few CPUs are offline so that the number
> > + * of CPUs in node0 is not exactly half or a quarter of the CPUs sharing the
> > + * LLC of CPU0.
> > + */
> > +static int snc_ways(void)
> > +{
> > +	int node_cpus, cache_cpus;
> > +
> > +	node_cpus = count_sys_bitmap_bits("/sys/devices/system/node/node0/cpumap");
> > +	cache_cpus = count_sys_bitmap_bits("/sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_map");
> > +
> > +	if (!node_cpus || !cache_cpus) {
> > +		fprintf(stderr, "Warning could not determine Sub-NUMA Cluster mode\n");
> > +		return 1;
> > +	}
> > +
> > +	if (4 * node_cpus >= cache_cpus)
> > +		return 4;
> > +	else if (2 * node_cpus >= cache_cpus)
> > +		return 2;
> > +	return 1;
> > +}
> > +
> >  /*
> >   * get_cache_size - Get cache size for a specified CPU
> >   * @cpu_no:	CPU number
> > @@ -190,6 +245,8 @@ int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size)
> >  			break;
> >  	}
> >  
> > +	if (cache_num == 3)
> > +		*cache_size /= snc_ways();
> >  	return 0;
> >  }
> >  
> 
> I am surprised that this small change is sufficient. The resctrl
> selftests are definitely not NUMA aware and the CAT and CMT tests
> are not taking that into account when picking CPUs to run on. From
> what I understand LLC occupancy counters need to be added in this
> scenario but I do not see that done either.

This is a first step (the tests are definitely going to fail if
they have incorrect information about the cache size).

For a fully reliable set of tests some major surgery will be required
to bind to CPUs and memory to control allocation and access.

> 
> Reinette

-Tony



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux