Re: [PATCH v2 24/24] selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test

Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> · Wed, 26 Apr 2023 16:58:42 +0300 (EEST)

On Fri, 21 Apr 2023, Reinette Chatre wrote:

> Hi Ilpo,
> 
> On 4/18/2023 4:45 AM, Ilpo Järvinen wrote:
> > CAT test spawns two processes into two different control groups with
> > exclusive schemata. Both the processes alloc a buffer from memory
> > matching their allocated LLC block size and flush the entire buffer out
> > of caches. Since the processes are reading through the buffer only once
> > during the measurement and initially all the buffer was flushed, the
> > test isn't testing CAT.
> > 
> > Rewrite the CAT test to allocated a buffer sized to half of LLC. Then
> 
> "allocated a buffer" -> "allocate a buffer" ?
> 
> > perform a sequence of tests with different LLC alloc sizes starting
> > from half of the CBM bits down to 1-bit CBM. Flush the buffer before
> > each test and read the buffer twice. Observe the LLC misses on the
> > second read through the buffer. As the allocated LLC block gets smaller
> > and smaller, the LLC misses will become larger and larger giving a
> > strong signal on CAT working properly.
> 
> Since the changelog starts by describing the CAT test needing two
> processes I think it would help to highlight that this test uses a
> single process. I think it would also help to describing how the cache
> is used by the rest while this test is running.

Sure, good points, I'll add the info.

> > Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> > ---
> >  tools/testing/selftests/resctrl/cache.c    |  20 +-
> >  tools/testing/selftests/resctrl/cat_test.c | 204 +++++++++------------
> >  2 files changed, 97 insertions(+), 127 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
> > index 7970239413da..64f08ba5edc2 100644
> > --- a/tools/testing/selftests/resctrl/cache.c
> > +++ b/tools/testing/selftests/resctrl/cache.c
> > @@ -224,10 +224,10 @@ int measure_llc_resctrl(struct resctrl_val_param *param, int bm_pid)
> >   */
> >  int cat_val(struct resctrl_val_param *param)
> >  {
> > -	int memflush = 1, operation = 0, ret = 0;
> >  	char *resctrl_val = param->resctrl_val;
> >  	unsigned long llc_perf_miss = 0;
> >  	pid_t bm_pid;
> > +	int ret;
> >  
> >  	if (strcmp(param->filename, "") == 0)
> >  		sprintf(param->filename, "stdio");
> > @@ -245,6 +245,10 @@ int cat_val(struct resctrl_val_param *param)
> >  	if (ret)
> >  		return ret;
> >  
> > +	ret = alloc_buffer(param->span, 1);
> > +	if (ret)
> > +		return ret;
> > +
> >  	initialize_llc_perf();
> >  
> >  	/* Test runs until the callback setup() tells the test to stop. */
> > @@ -256,17 +260,15 @@ int cat_val(struct resctrl_val_param *param)
> >  		}
> >  		if (ret < 0)
> >  			break;
> > +
> > +		flush_buffer(param->span);
> > +		use_buffer(param->span, 0, true);
> > +
> >  		ret = reset_enable_llc_perf(bm_pid, param->cpu_no);
> >  		if (ret)
> >  			break;
> >  
> > -		if (run_fill_buf(param->span, memflush, operation, true)) {
> > -			fprintf(stderr, "Error-running fill buffer\n");
> > -			ret = -1;
> > -			break;
> > -		}
> > -
> > -		sleep(1);
> > +		use_buffer(param->span, 0, true);
> >  
> >  		/* Measure cache miss from perf */
> >  		ret = get_llc_perf(&llc_perf_miss);
> > @@ -279,6 +281,8 @@ int cat_val(struct resctrl_val_param *param)
> >  			break;
> >  	}
> >  
> > +	free_buffer();
> > +
> >  	return ret;
> >  }
> >  
> > diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> > index 4b505fdb35d7..85053829b9c5 100644
> > --- a/tools/testing/selftests/resctrl/cat_test.c
> > +++ b/tools/testing/selftests/resctrl/cat_test.c
> > @@ -11,11 +11,12 @@
> >  #include "resctrl.h"
> >  #include <unistd.h>
> >  
> > -#define RESULT_FILE_NAME1	"result_cat1"
> > -#define RESULT_FILE_NAME2	"result_cat2"
> > -#define NUM_OF_RUNS		5
> > -#define MAX_DIFF_PERCENT	4
> > -#define MAX_DIFF		1000000
> > +#define RESULT_FILE_NAME		"result_cat"
> > +#define NUM_OF_RUNS			5
> > +#define MIN_DIFF_PERCENT_PER_BIT	2
> 
> Could you please start a new trend that adds documentation
> that explains what this constant means and how it was chosen?

I can try although that particular 2 was a bit handwavy that just seems to 
work with the tests I performed.

> > +static unsigned long current_mask;
> > +static long prev_avg_llc_val;
> >  
> >  /*
> >   * Change schemata. Write schemata to specified
> > @@ -28,13 +29,24 @@ static int cat_setup(struct resctrl_val_param *p)
> >  	int ret = 0;
> >  
> >  	/* Run NUM_OF_RUNS times */
> > -	if (p->num_of_runs >= NUM_OF_RUNS)
> > -		return END_OF_TESTS;
> > +	if (p->num_of_runs >= NUM_OF_RUNS) {
> > +		/* Remove one bit from the consecutive block */
> > +		current_mask &= current_mask >> 1;
> > +		if (!current_mask)
> > +			return END_OF_TESTS;
> > +
> > +		p->num_of_runs = 0;
> 
> This seems like a workaround to get the schemata to be written. It is
> problematic since now p->num_of_runs no longer accurately reflects the
> number of test runs.

This is already the case. MBA test works around this very same problem by 
using a custom static variable (runs_per_allocation) which is reset to 0 
every NUM_OF_RUNS tests and not keeping ->num_of_runs at all. If MBA test 
would replace runs_per_allocation with use of ->num_of_runs, it would 
match what the new CAT test does.

Nothing currently relies on ->num_of_runs counting across the different 
"tests" that are run inside CAT and MBA tests. And I don't have anything 
immediately around the corner that would require ->num_of_runs to count 
total number of repetitions that were ran.

I guess it would be possible to attempt to consolidate that second layer
MBA and the rewritten CAT tests need somehow into resctrl_val_param. But 
IMHO that too is low-prio refactor as nothing is broken as is.

> I was expecting this mask manipulation to be
> in cat_val() so that it is clear how test works instead of part
> of the logic handled here.

That seems to be moving into opposite direction from how things are 
currently handled. Doing it in cat_val() would be relying less on 
->setup(). If that's the preferred direction, then the question becomes, 
should CAT test do anything in ->setup() because also the schemata 
writing could be done in directly cat_val().

What I would prefer not to do is to have a rule which says: if there's a 
test-specific function, don't use ->setup() but do any setup directly 
in the test-specific function but, otherwise use ->setup(). Such an
inconsistency would make things hard to track.

> > +	}
> >  
> >  	if (p->num_of_runs == 0) {
> > -		sprintf(schemata, "%lx", p->mask);
> > -		ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no,
> > -				     p->resctrl_val);
> > +		snprintf(schemata, sizeof(schemata), "%lx", p->mask & ~current_mask);
> > +		ret = write_schemata("", schemata, p->cpu_no, p->resctrl_val);
> > +		if (ret)
> > +			return ret;
> > +		snprintf(schemata, sizeof(schemata), "%lx", current_mask);
> > +		ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no, p->resctrl_val);
> > +		if (ret)
> > +			return ret;
> >  	}
> >  	p->num_of_runs++;
> >  
> 
> ...
> 
> > @@ -126,7 +162,7 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >  	ret = get_mask_no_shareable(cache_type, &long_mask);
> >  	if (ret)
> >  		return ret;
> > -	count_of_bits = count_consecutive_bits(long_mask, NULL);
> > +	count_of_bits = count_consecutive_bits(long_mask, &start);
> >  
> >  	/* Get L3/L2 cache size */
> >  	ret = get_cache_size(cpu_no, cache_type, &cache_size);
> > @@ -143,99 +179,29 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >  			       count_of_bits - 1);
> >  		return -1;
> >  	}
> > -
> > -	/* Get core id from same socket for running another thread */
> > -	sibling_cpu_no = get_core_sibling(cpu_no);
> 
> Do any users of get_core_sibling() remain after this?

Correct observation, there seems to be no other users after this is 
removed.

-- 
 i.