On Mon, Sep 09, 2024 at 04:50:31PM +0100, Alan Maguire wrote: > When recording function information for later comparison (as we > do when skipping inconsistent function descriptions), we utilize > DWARF-based representations because we do not want to jump the > gun and add BTF representations for functions that have inconsistent > representations across CUs (and across encoders in parallel mode). > > So to handle this, we save info about functions, and we can then add > them later once we have ensured their various representations are > in fact consistent. However to ensure that the function info > is still valid, we need to specify LSK__KEEPIT for CUs, which > bloats memory usage (in some cases to ~4Gb). This is not a good > approach, mea culpa. > > Instead, store a BTF-centric representation where we > > - store the number of parameters > - store the BTF ids of return type and parameters > - store the name of the parameters where present > - store any LLVM annotation values, component idxs if present > > So in summary, store everything we need to add the BTF_KIND_FUNC > and BTF_KIND_FUNC_PROTO and any associated annotations. This will > allow us to free CUs as we go but make it possible to add functions > later. > > For name storage we can take advantage of the fact that > BTF will avoid re-adding a name string so we btf__add_str() > to add the parameter name and store the string offset instead; > this prevents duplicate name storage while ensuring the parameter > name is in BTF. > > When we cross-compare functions for consistency, do a shallow > analysis akin to what was done with DWARF prototype comparisons; > compare base types by name, reference types by target type, > match loosely between fwds, structs and unions etc. > > When this is done, memory consumption peaks at 1Gb rather than > ~4Gb for vmlinux generation. Time taken appears to be approximately > the same for -j1, but slightly faster for multiple threads; > for example: nice! did not realize LSK__KEEPIT was that bad.. > > Baseline > > $ time pahole -J vmlinux -j1 --btf_features=default > > real 0m17.268s > user 0m15.808s > sys 0m1.415s > > $ time pahole -J vmlinux -j8 --btf_features=default > > real 0m10.768s > user 0m30.793s > sys 0m4.199s > > With these changes: > > $ time pahole -J vmlinux -j1 --btf_features=default > > real 0m16.564s > user 0m16.029s > sys 0m0.492s > > $ time pahole -J vmlinux -j8 --btf_features=default > > real 0m8.332s > user 0m30.627s > sys 0m0.714s > > In terms of functions encoded, 360 fewer functions make it into on my setup I'm getting 386 fewer fcuntions > BTF due to the different approach in consistency checking, but after > examining these cases, they do appear to be legitimately inconsistent > functions where the optimized versions have parameter mismatches > with the non-optimized expectations. I checked on one case and it seems like obvious inconsistency: static void io_serial_out(unsigned long addr, int offset, int value) static void io_serial_out(struct uart_port *p, int offset, int value) not sure how we could missed that before > > Mileage may vary of course, and any testing folks could do would > be greatly appreciated! would be nice to have some test for before/after vmlinux images, that would compare generated BTF functions thanks, jirka SNIP > static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn, struct elf_function *func) > { > - fn->priv = encoder->cu; > - if (func->function) { > - struct function *existing = func->function; > + struct btf_encoder_func_state *state = zalloc(sizeof(*state)); > + struct ftype *ftype = &fn->proto; > + struct btf *btf = encoder->btf; > + struct llvm_annotation *annot; > + struct parameter *param; > + uint8_t param_idx = 0; > + > + if (!state) > + return -ENOMEM; > + state->nr_parms = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0); > + state->ret_type_id = ftype->tag.type == 0 ? 0 : encoder->type_id_off + ftype->tag.type; > + if (state->nr_parms > 0) { > + state->parms = zalloc(state->nr_parms * sizeof(*state->parms)); > + if (!state->parms) > + return -ENOMEM; > + } > + state->inconsistent_proto = ftype->inconsistent_proto; > + state->unexpected_reg = ftype->unexpected_reg; > + state->optimized_parms = ftype->optimized_parms; > + ftype__for_each_parameter(ftype, param) { > + const char *name = parameter__name(param) ?: ""; > + > + state->parms[param_idx].name_off = btf__add_str(btf, name); > + if (state->parms[param_idx].name_off < 0) > + return state->parms[param_idx].name_off; > + state->parms[param_idx].type_id = param->tag.type == 0 ? 0 : encoder->type_id_off + param->tag.type; so IIUC because functions are added as last we're sure all the used types for arguments are stored in encoder's BTF already.. for funcs__match later ? > + param_idx++; > + } > + if (ftype->unspec_parms) > + state->parms[param_idx].type_id = 0; > + > + list_for_each_entry(annot, &fn->annots, node) > + state->nr_annots++; > + if (state->nr_annots) { > + uint8_t idx = 0; > + > + state->annots = zalloc(sizeof(*state->annots)); zalloc needs sizeof(..) * state->nr_annots ? > + if (!state->annots) > + return -ENOMEM; > + list_for_each_entry(annot, &fn->annots, node) { > + state->annots[idx].value = btf__add_str(encoder->btf, annot->value); > + if (state->annots[idx].value < 0) > + return state->annots[idx].value; > + state->annots[idx].component_idx = annot->component_idx; > + idx++; > + } > + } > + > + if (func->state) { > + struct btf_encoder_func_state *existing = func->state; > > /* If saving and we find an existing entry, we want to merge > * observations across both functions, checking that the > @@ -899,98 +1097,136 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi > * to add the local function later (encoder + type_id_off) > * such that we can add the function later. > */ > - existing->proto.optimized_parms |= fn->proto.optimized_parms; > - existing->proto.unexpected_reg |= fn->proto.unexpected_reg; > - if (!existing->proto.unexpected_reg && !existing->proto.inconsistent_proto && > - !funcs__match(encoder, func, fn)) > - existing->proto.inconsistent_proto = 1; > + existing->optimized_parms |= state->optimized_parms; > + existing->unexpected_reg |= state->unexpected_reg; > + if (!existing->unexpected_reg && !state->inconsistent_proto && > + !funcs__match(encoder, func, encoder->btf, state, > + encoder->btf, existing)) > + existing->inconsistent_proto = 1; > + zfree(&state->annots); > + zfree(&state->parms); > + free(state); some error returns above are missing whole state cleanup > } else { > - func->state.type_id_off = encoder->type_id_off; > - func->function = fn; > - encoder->cu->functions_saved++; > + func->state = state; > } > return 0; > } > SNIP