Re: [PATCH v2 bpf-next 1/2] bpftool: introduce "prog profile" command

Song Liu <songliubraving@xxxxxx> · Tue, 3 Mar 2020 00:10:50 +0000

> On Mar 1, 2020, at 8:24 PM, Yonghong Song <yhs@xxxxxx> wrote:
>> +	},
>> +	{
>> +		.name = "instructions",
>> +		.attr = {
>> +			.freq = 0,
>> +			.sample_period = SAMPLE_PERIOD,
>> +			.inherit = 0,
>> +			.type = PERF_TYPE_HARDWARE,
>> +			.read_format = 0,
>> +			.sample_type = 0,
>> +			.config = PERF_COUNT_HW_INSTRUCTIONS,
>> +		},
>> +		.ratio_metric = 1,
>> +		.ratio_mul = 1.0,
>> +		.ratio_desc = "insn per cycle",
>> +	},
>> +	{
>> +		.name = "l1d_loads",
>> +		.attr = {
>> +			.freq = 0,
>> +			.sample_period = SAMPLE_PERIOD,
>> +			.inherit = 0,
>> +			.type = PERF_TYPE_HW_CACHE,
>> +			.read_format = 0,
>> +			.sample_type = 0,
>> +			.config =
>> +				PERF_COUNT_HW_CACHE_L1D |
>> +				(PERF_COUNT_HW_CACHE_OP_READ << 8) |
>> +				(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16),
>> +		},
> 
> why we do not have metric here?

This follows perf-stat design: some events have another event to compare 
against, like instructions per cycle, etc. 

> 
>> +	},
>> +	{
>> +		.name = "llc_misses",
>> +		.attr = {
>> +			.freq = 0,
>> +			.sample_period = SAMPLE_PERIOD,
>> +			.inherit = 0,
>> +			.type = PERF_TYPE_HW_CACHE,
>> +			.read_format = 0,
>> +			.sample_type = 0,
>> +			.config =
>> +				PERF_COUNT_HW_CACHE_LL |
>> +				(PERF_COUNT_HW_CACHE_OP_READ << 8) |
>> +				(PERF_COUNT_HW_CACHE_RESULT_MISS << 16),
>> +		},
>> +		.ratio_metric = 2,
>> +		.ratio_mul = 1e6,
>> +		.ratio_desc = "LLC misses per million isns",
>> +	},
>> +};
> 
> icache miss and itlb miss might be useful as well as the code will jump
> to a different physical page. I think we should addd them. dtlb_miss
> probably not a big problem, but it would be good to be an option.

I plan to add more events later on. 

> 
> For ratio_metric, we explicitly assign a slot here. Any specific reason?
> We can just say this metric *permits* ratio_metric and then ratio_matric
> is assigned dynamically by the user command line options?
> 
> I am thinking how we could support *all* metrics the underlying system
> support based on `perf list`. This can be the future work though.

We are also thinking about adding similar functionality to perf-stat, 
which will be more flexible. 

> 
>> +
>> +u64 profile_total_count;
> 
[...]
>> +
>> +	reading_map_fd = bpf_map__fd(obj->maps.accum_readings);
>> +	count_map_fd = bpf_map__fd(obj->maps.counts);
>> +	if (reading_map_fd < 0 || count_map_fd < 0) {
>> +		p_err("failed to get fd for map");
>> +		return;
>> +	}
>> +
>> +	assert(bpf_map_lookup_elem(count_map_fd, &key, counts) == 0);
> 
> In the patch, I see sometime bpf_map_lookup_elem() result is checked
> with failure being handled. Sometimes, assert() is used. Maybe be
> consistent with checking result approach?

Will fix. 

[...]
> 
>> 
>> +}
>> +
>> +#define PROFILE_DEFAULT_LONG_DURATION (3600 * 24)
> 
> We need to let user know this value in "help" at least.
> In "man" page it may be get updated but I think we probably
> should add there as well.

I am planning to just use UINT_MAX. 

> 
>> +
[...]

>> +#define BPF_PROG(name, args...)						    \
>> +name(unsigned long long *ctx);						    \
>> +static __always_inline typeof(name(0))					    \
>> +____##name(unsigned long long *ctx, ##args);				    \
>> +typeof(name(0)) name(unsigned long long *ctx)				    \
>> +{									    \
>> +	_Pragma("GCC diagnostic push")					    \
>> +	_Pragma("GCC diagnostic ignored \"-Wint-conversion\"")		    \
>> +	return ____##name(___bpf_ctx_cast(args));			    \
>> +	_Pragma("GCC diagnostic pop")					    \
>> +}									    \
>> +static __always_inline typeof(name(0))					    \
>> +____##name(unsigned long long *ctx, ##args)
> 
> I know it is internal. But all the above macros are not great in
> a bpf program. If we can reuse/amend current infrastructure.
> That will be great. It may benefit users writing a similar
> bpf program to here.

This is copied from tools/testing/selftests/bpf/bpf_trace_helpers.h. 
I think we will move them to libbpf later. Then we can use that 
version instead. 

Thanks,
Song