On 25/01/2023 17:54, Kui-Feng Lee wrote: > > On 1/24/23 05:45, Alan Maguire wrote: >> +/* >> + * static functions with suffixes are not added yet - we need to >> + * observe across all CUs to see if the static function has >> + * optimized parameters in any CU, since in such a case it should >> + * not be included in the final BTF. NF_HOOK.constprop.0() is >> + * a case in point - it has optimized-out parameters in some CUs >> + * but not others. In order to have consistency (since we do not >> + * know which instance the BTF-specified function signature will >> + * apply to), we simply skip adding functions which have optimized >> + * out parameters anywhere. >> + */ >> +static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn) >> +{ >> + struct btf_encoder *parent = encoder->parent ? encoder->parent : encoder; >> + const char *name = function__name(fn); >> + struct function **nodep; >> + int ret = 0; >> + >> + pthread_mutex_lock(&parent->saved_func_lock); > > Do you have the number of static functions with suffices? > There are a few thousand, and around 25000 static functions overall ("."-suffixed are all static) that will participate in the tree representations (see patch 5). This equates to roughly half of the vmlinux BTF functions. > If the number of static functions with suffices is high, the contention of the lock would be an issue. > > Is it possible to keep a local pool of static functions with suffices? The pool will be combined with its parent either at the completion of a CU, before ending the thread or when merging into the main thread. > It's possible alright. I'll try to lay out the possibilities so we can figure out the best way forward. Option 1: global tree of static functions, created during DWARF loading Pro: Quick addition/lookup, we can flag optimizations or inconsistent prototypes as we encounter them. Con: Lock contention between encoder threads. Option 2: store static functions in a per-encoder tree, traverse them all prior to BTF merging to eliminate unwanted functions Pro: limits contention. Con: for each static function in each encoder, we need to look it up in all other encoder trees. In option 1 we paid that price as the function was added, here we pay it later on prior to merging. So processing here is O(number_functions * num_encoders). There may be a cleverer way to handle this but I can't see it right now. There may be other approaches to this of course, but these were the two I could come up with. What do you think? Alan > >> + nodep = tsearch(fn, &parent->saved_func_tree, function__compare); >> + if (nodep == NULL) { >> + fprintf(stderr, "error: out of memory adding local function '%s'\n", >> + name); >> + ret = -1; >> + goto out; >> + } >> + /* If we find an existing entry, we want to merge observations >> + * across both functions, checking that the "seen optimized-out >> + * parameters" status is reflected in our tree entry. >> + * If the entry is new, record encoder state required >> + * to add the local function later (encoder + type_id_off) >> + * such that we can add the function later. >> + */ >> + if (*nodep != fn) { >> + (*nodep)->proto.optimized_parms |= fn->proto.optimized_parms; >> + } else { >> + struct btf_encoder_state *state = zalloc(sizeof(*state)); >> + >> + if (state == NULL) { >> + fprintf(stderr, "error: out of memory adding local function '%s'\n", >> + name); >> + ret = -1; >> + goto out; >> + } >> + state->encoder = encoder; >> + state->type_id_off = encoder->type_id_off; >> + fn->priv = state; >> + encoder->saved_func_cnt++; >> + if (encoder->verbose) >> + printf("added local function '%s'%s\n", name, >> + fn->proto.optimized_parms ? >> + ", optimized-out params" : ""); >> + } >> +out: >> + pthread_mutex_unlock(&parent->saved_func_lock); >> + return ret; >> +} >> +