Re: [RFC dwarves] btf_encoder: record BTF-centric function state instead of DWARF-centric

Jiri Olsa <olsajiri@xxxxxxxxx> · Mon, 9 Sep 2024 22:01:45 +0200

On Mon, Sep 09, 2024 at 04:50:31PM +0100, Alan Maguire wrote:
> When recording function information for later comparison (as we
> do when skipping inconsistent function descriptions), we utilize
> DWARF-based representations because we do not want to jump the
> gun and add BTF representations for functions that have inconsistent
> representations across CUs (and across encoders in parallel mode).
> 
> So to handle this, we save info about functions, and we can then add
> them later once we have ensured their various representations are
> in fact consistent.  However to ensure that the function info
> is still valid, we need to specify LSK__KEEPIT for CUs, which
> bloats memory usage (in some cases to ~4Gb).  This is not a good
> approach, mea culpa.
> 
> Instead, store a BTF-centric representation where we
> 
> - store the number of parameters
> - store the BTF ids of return type and parameters
> - store the name of the parameters where present
> - store any LLVM annotation values, component idxs if present
> 
> So in summary, store everything we need to add the BTF_KIND_FUNC
> and BTF_KIND_FUNC_PROTO and any associated annotations.  This will
> allow us to free CUs as we go but make it possible to add functions
> later.
> 
> For name storage we can take advantage of the fact that
> BTF will avoid re-adding a name string so we btf__add_str()
> to add the parameter name and store the string offset instead;
> this prevents duplicate name storage while ensuring the parameter
> name is in BTF.
> 
> When we cross-compare functions for consistency, do a shallow
> analysis akin to what was done with DWARF prototype comparisons;
> compare base types by name, reference types by target type,
> match loosely between fwds, structs and unions etc.
> 
> When this is done, memory consumption peaks at 1Gb rather than
> ~4Gb for vmlinux generation.  Time taken appears to be approximately
> the same for -j1, but slightly faster for multiple threads;
> for example:

nice! did not realize LSK__KEEPIT was that bad..

> 
> Baseline
> 
> $ time pahole -J vmlinux -j1 --btf_features=default
> 
> real	0m17.268s
> user	0m15.808s
> sys	0m1.415s
> 
> $ time pahole -J vmlinux -j8 --btf_features=default
> 
> real	0m10.768s
> user	0m30.793s
> sys	0m4.199s
> 
> With these changes:
> 
> $ time pahole -J vmlinux -j1 --btf_features=default
> 
> real	0m16.564s
> user	0m16.029s
> sys	0m0.492s
> 
> $ time pahole -J vmlinux -j8 --btf_features=default
> 
> real	0m8.332s
> user	0m30.627s
> sys	0m0.714s
> 
> In terms of functions encoded, 360 fewer functions make it into

on my setup I'm getting 386 fewer fcuntions

> BTF due to the different approach in consistency checking, but after
> examining these cases, they do appear to be legitimately inconsistent
> functions where the optimized versions have parameter mismatches
> with the non-optimized expectations.

I checked on one case and it seems like obvious inconsistency:

  static void io_serial_out(unsigned long addr, int offset, int value)
  static void io_serial_out(struct uart_port *p, int offset, int value)

not sure how we could missed that before

> 
> Mileage may vary of course, and any testing folks could do would
> be greatly appreciated!

would be nice to have some test for before/after vmlinux images,
that would compare generated BTF functions

thanks,
jirka

SNIP

>  static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn, struct elf_function *func)
>  {
> -	fn->priv = encoder->cu;
> -	if (func->function) {
> -		struct function *existing = func->function;
> +	struct btf_encoder_func_state *state = zalloc(sizeof(*state));
> +	struct ftype *ftype = &fn->proto;
> +	struct btf *btf = encoder->btf;
> +	struct llvm_annotation *annot;
> +	struct parameter *param;
> +	uint8_t param_idx = 0;
> +
> +	if (!state)
> +		return -ENOMEM;
> +	state->nr_parms = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0);
> +	state->ret_type_id = ftype->tag.type == 0 ? 0 : encoder->type_id_off + ftype->tag.type;
> +	if (state->nr_parms > 0) {
> +		state->parms = zalloc(state->nr_parms * sizeof(*state->parms));
> +		if (!state->parms)
> +			return -ENOMEM;
> +	}
> +	state->inconsistent_proto = ftype->inconsistent_proto;
> +	state->unexpected_reg = ftype->unexpected_reg;
> +	state->optimized_parms = ftype->optimized_parms;
> +	ftype__for_each_parameter(ftype, param) {
> +		const char *name = parameter__name(param) ?: "";
> +
> +		state->parms[param_idx].name_off = btf__add_str(btf, name);
> +		if (state->parms[param_idx].name_off < 0)
> +			return state->parms[param_idx].name_off;
> +		state->parms[param_idx].type_id = param->tag.type == 0 ? 0 : encoder->type_id_off + param->tag.type;

so IIUC because functions are added as last we're sure all the used types
for arguments are stored in encoder's BTF already.. for funcs__match later ?

> +		param_idx++;
> +	}
> +	if (ftype->unspec_parms)
> +		state->parms[param_idx].type_id = 0;
> +
> +	list_for_each_entry(annot, &fn->annots, node)
> +		state->nr_annots++;
> +	if (state->nr_annots) {
> +		uint8_t idx = 0;
> +
> +		state->annots = zalloc(sizeof(*state->annots));

zalloc needs sizeof(..) * state->nr_annots ?

> +		if (!state->annots)
> +			return -ENOMEM;
> +		list_for_each_entry(annot, &fn->annots, node) {
> +			state->annots[idx].value = btf__add_str(encoder->btf, annot->value);
> +			if (state->annots[idx].value < 0)
> +				return state->annots[idx].value;
> +			state->annots[idx].component_idx = annot->component_idx;
> +			idx++;
> +		}
> +	}
> +
> +	if (func->state) {
> +		struct btf_encoder_func_state *existing = func->state;
>  
>  		/* If saving and we find an existing entry, we want to merge
>  		 * observations across both functions, checking that the
> @@ -899,98 +1097,136 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
>  		 * to add the local function later (encoder + type_id_off)
>  		 * such that we can add the function later.
>  		 */
> -		existing->proto.optimized_parms |= fn->proto.optimized_parms;
> -		existing->proto.unexpected_reg |= fn->proto.unexpected_reg;
> -		if (!existing->proto.unexpected_reg && !existing->proto.inconsistent_proto &&
> -		     !funcs__match(encoder, func, fn))
> -			existing->proto.inconsistent_proto = 1;
> +		existing->optimized_parms |= state->optimized_parms;
> +		existing->unexpected_reg |= state->unexpected_reg;
> +		if (!existing->unexpected_reg && !state->inconsistent_proto &&
> +		     !funcs__match(encoder, func, encoder->btf, state,
> +				   encoder->btf, existing))
> +			existing->inconsistent_proto = 1;
> +		zfree(&state->annots);
> +		zfree(&state->parms);
> +		free(state);

some error returns above are missing whole state cleanup

>  	} else {
> -		func->state.type_id_off = encoder->type_id_off;
> -		func->function = fn;
> -		encoder->cu->functions_saved++;
> +		func->state = state;
>  	}
>  	return 0;
>  }
>  

SNIP