Re: [PATCH v2 24/24] selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Fri, 21 Apr 2023 17:32:20 -0700

Hi Ilpo,

On 4/18/2023 4:45 AM, Ilpo Järvinen wrote:
> CAT test spawns two processes into two different control groups with
> exclusive schemata. Both the processes alloc a buffer from memory
> matching their allocated LLC block size and flush the entire buffer out
> of caches. Since the processes are reading through the buffer only once
> during the measurement and initially all the buffer was flushed, the
> test isn't testing CAT.
> 
> Rewrite the CAT test to allocated a buffer sized to half of LLC. Then

"allocated a buffer" -> "allocate a buffer" ?

> perform a sequence of tests with different LLC alloc sizes starting
> from half of the CBM bits down to 1-bit CBM. Flush the buffer before
> each test and read the buffer twice. Observe the LLC misses on the
> second read through the buffer. As the allocated LLC block gets smaller
> and smaller, the LLC misses will become larger and larger giving a
> strong signal on CAT working properly.

Since the changelog starts by describing the CAT test needing two
processes I think it would help to highlight that this test uses a
single process. I think it would also help to describing how the cache
is used by the rest while this test is running.

> 
> Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> ---
>  tools/testing/selftests/resctrl/cache.c    |  20 +-
>  tools/testing/selftests/resctrl/cat_test.c | 204 +++++++++------------
>  2 files changed, 97 insertions(+), 127 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
> index 7970239413da..64f08ba5edc2 100644
> --- a/tools/testing/selftests/resctrl/cache.c
> +++ b/tools/testing/selftests/resctrl/cache.c
> @@ -224,10 +224,10 @@ int measure_llc_resctrl(struct resctrl_val_param *param, int bm_pid)
>   */
>  int cat_val(struct resctrl_val_param *param)
>  {
> -	int memflush = 1, operation = 0, ret = 0;
>  	char *resctrl_val = param->resctrl_val;
>  	unsigned long llc_perf_miss = 0;
>  	pid_t bm_pid;
> +	int ret;
>  
>  	if (strcmp(param->filename, "") == 0)
>  		sprintf(param->filename, "stdio");
> @@ -245,6 +245,10 @@ int cat_val(struct resctrl_val_param *param)
>  	if (ret)
>  		return ret;
>  
> +	ret = alloc_buffer(param->span, 1);
> +	if (ret)
> +		return ret;
> +
>  	initialize_llc_perf();
>  
>  	/* Test runs until the callback setup() tells the test to stop. */
> @@ -256,17 +260,15 @@ int cat_val(struct resctrl_val_param *param)
>  		}
>  		if (ret < 0)
>  			break;
> +
> +		flush_buffer(param->span);
> +		use_buffer(param->span, 0, true);
> +
>  		ret = reset_enable_llc_perf(bm_pid, param->cpu_no);
>  		if (ret)
>  			break;
>  
> -		if (run_fill_buf(param->span, memflush, operation, true)) {
> -			fprintf(stderr, "Error-running fill buffer\n");
> -			ret = -1;
> -			break;
> -		}
> -
> -		sleep(1);
> +		use_buffer(param->span, 0, true);
>  
>  		/* Measure cache miss from perf */
>  		ret = get_llc_perf(&llc_perf_miss);
> @@ -279,6 +281,8 @@ int cat_val(struct resctrl_val_param *param)
>  			break;
>  	}
>  
> +	free_buffer();
> +
>  	return ret;
>  }
>  
> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> index 4b505fdb35d7..85053829b9c5 100644
> --- a/tools/testing/selftests/resctrl/cat_test.c
> +++ b/tools/testing/selftests/resctrl/cat_test.c
> @@ -11,11 +11,12 @@
>  #include "resctrl.h"
>  #include <unistd.h>
>  
> -#define RESULT_FILE_NAME1	"result_cat1"
> -#define RESULT_FILE_NAME2	"result_cat2"
> -#define NUM_OF_RUNS		5
> -#define MAX_DIFF_PERCENT	4
> -#define MAX_DIFF		1000000
> +#define RESULT_FILE_NAME		"result_cat"
> +#define NUM_OF_RUNS			5
> +#define MIN_DIFF_PERCENT_PER_BIT	2

Could you please start a new trend that adds documentation
that explains what this constant means and how it was chosen?

> +
> +static unsigned long current_mask;
> +static long prev_avg_llc_val;
>  
>  /*
>   * Change schemata. Write schemata to specified
> @@ -28,13 +29,24 @@ static int cat_setup(struct resctrl_val_param *p)
>  	int ret = 0;
>  
>  	/* Run NUM_OF_RUNS times */
> -	if (p->num_of_runs >= NUM_OF_RUNS)
> -		return END_OF_TESTS;
> +	if (p->num_of_runs >= NUM_OF_RUNS) {
> +		/* Remove one bit from the consecutive block */
> +		current_mask &= current_mask >> 1;
> +		if (!current_mask)
> +			return END_OF_TESTS;
> +
> +		p->num_of_runs = 0;

This seems like a workaround to get the schemata to be written. It is
problematic since now p->num_of_runs no longer accurately reflects the
number of test runs. I was expecting this mask manipulation to be
in cat_val() so that it is clear how test works instead of part
of the logic handled here.

> +	}
>  
>  	if (p->num_of_runs == 0) {
> -		sprintf(schemata, "%lx", p->mask);
> -		ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no,
> -				     p->resctrl_val);
> +		snprintf(schemata, sizeof(schemata), "%lx", p->mask & ~current_mask);
> +		ret = write_schemata("", schemata, p->cpu_no, p->resctrl_val);
> +		if (ret)
> +			return ret;
> +		snprintf(schemata, sizeof(schemata), "%lx", current_mask);
> +		ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no, p->resctrl_val);
> +		if (ret)
> +			return ret;
>  	}
>  	p->num_of_runs++;
>  

...

> @@ -126,7 +162,7 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>  	ret = get_mask_no_shareable(cache_type, &long_mask);
>  	if (ret)
>  		return ret;
> -	count_of_bits = count_consecutive_bits(long_mask, NULL);
> +	count_of_bits = count_consecutive_bits(long_mask, &start);
>  
>  	/* Get L3/L2 cache size */
>  	ret = get_cache_size(cpu_no, cache_type, &cache_size);
> @@ -143,99 +179,29 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>  			       count_of_bits - 1);
>  		return -1;
>  	}
> -
> -	/* Get core id from same socket for running another thread */
> -	sibling_cpu_no = get_core_sibling(cpu_no);

Do any users of get_core_sibling() remain after this?

Reinette