On Mon, 2022-01-24 at 12:13 -0800, Andrii Nakryiko wrote: > On Mon, Jan 24, 2022 at 11:19 AM Kui-Feng Lee <kuifeng@xxxxxx> wrote: > > > > Create an instance of btf for each worker thread, and add type info > > to > > the local btf instance in the steal-function of pahole without mutex > > acquiring. Once finished with all worker threads, merge all > > per-thread btf instances to the primary btf instance. > > > > Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxx> > > --- > > BTW, I've already synced your btf__add_btf() optimization to Github, > so you can bump pahole libbpf submodule reference in the next > revision. You'll get a bunch of deprecation warnings, so it would be > great to fix those at the same time. Got it! ...... cut ..... > > > > +static int pahole_thread_exit(struct conf_load *conf, void > > *thr_data) > > +{ > > + struct thread_data *thread = thr_data; > > + > > + if (thread == NULL) > > + return 0; > > + > > + /* > > + * Here we will call btf__dedup() here once we extend > > + * btf__dedup(). > > + */ > > + > > + return 0; > > +} > > + > > > You've ignored my question and suggestion to move btf__add_btf() into > pahole_thread_exit. Can you please comment on why it's not a good > idea, if you insist on not doing that? Sorry for not replying for this part. I tried it yesterday, and I don't see obvious improvement. I guess the improvement is not big enough to be observed. > > @@ -2819,8 +2886,8 @@ static enum load_steal_kind > > pahole_stealer(struct cu *cu, > > > > if (btf_encode) { > > static pthread_mutex_t btf_lock = > > PTHREAD_MUTEX_INITIALIZER; > > + struct btf_encoder *encoder; > > > > - pthread_mutex_lock(&btf_lock); > > /* > > * FIXME: > > * > > @@ -2828,21 +2895,55 @@ static enum load_steal_kind > > pahole_stealer(struct cu *cu, > > * point we'll have cu->elf setup... > > */ > > if (!btf_encoder) { > > - btf_encoder = btf_encoder__new(cu, > > detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars, > > - > > btf_encode_force, btf_gen_floats, global_verbose); > > + pthread_mutex_lock(&btf_lock); > > + if (!btf_encoder) { > > are you trying to minimize lock contention here with !btf_encoder > check outside and inside locked area? Why? It just adds more code > nesting and this locking can't be a real bottleneck. We are talking > about a very few threads being initialized. Please keep it simple, > lock, check !btf_encode and goto unlock, if it's already set. > Otherwise proceed to initialization. Got it! ....................... > > + if (btf_encoder && thr_data) { > > + struct thread_data *thread = > > (struct thread_data *)thr_data; > > + > > + thread->encoder = > > btf_encoder; > > + thread->btf = > > btf_encoder__btf(btf_encoder); > > Can you summarize the relationship between thr_data, btf_encode and > thread->encoder? I'm re-reading this code (including `if (thr_data)` > piece below) over and over and can't figure out why the initialization > pattern is so complicated. thr_data keeps per-thread information to worker threads. So, every worker thread has its instance to keep an encoder. The main thread use btf_encoder to encode BTF. thread->encoder is for a worker thread to add type info. Once finishing all worker threads, the main thread will add the btf instances of thread->encoders to the btf instance of btf_encoder. However, I reuse btf_encoder for the first worker thread, which reaches this function to create btf_encoder. It avoids copying data for the first worker thread. Does it make sense? If so, I will put the explanation in the code.