On Tue, 2022-01-25 at 22:07 -0800, Andrii Nakryiko wrote: > On Tue, Jan 25, 2022 at 8:07 PM Kui-Feng Lee <kuifeng@xxxxxx> wrote: > > > > Create an instance of btf for each worker thread, and add type info > > to > > the local btf instance in the steal-function of pahole without > > mutex > > acquiring. Once finished with all worker threads, merge all > > per-thread btf instances to the primary btf instance. > > > > Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxx> > > ............ cut ........... > > + struct thread_data *thread = thr_data; > > + > > + if (thread == NULL) > > + return 0; > > + > > + /* > > + * Here we will call btf__dedup() here once we extend > > + * btf__dedup(). > > + */ > > + > > + if (thread->encoder == btf_encoder) { > > + /* Release the lock acuqired when created > > btf_encoder */ > > typo: acquired > > > + pthread_mutex_unlock(&btf_encoder_lock); > > Splitting pthread_mutex lock/unlock like this is extremely dangerous > and error prone. If that's the price for reusing global btf_encoder > for first worker, then I'd rather not reuse btf_encoder or revert > back > to doing btf__add_btf() and doing btf_encoder__delete() in the main > thread. > > Please question and push back the approach and code review feedback > if > something doesn't feel natural or is causing more problems than > solves. > > I think the cleanest solution would be to not reuse global btf_encoder > for the first worker. I suspect the time difference isn't big, so I'd > optimize for simplicity and clean separation. But if it is causing > unnecessary slowdown, then as I said, let's just revert back to your > previous approach with doing btf__add_btf() in the main thread. > Your concerns make sense. I tried the solutions w/ & w/o reusing btf_encoder. The differencies are obviously. So, I will rollback to calling btf__add_btf() at the main thread. w/o reusing: AVG 5.78467 P50 5.88 P70 6.03 P90 6.10 w/ reusing: AVG 5.304 P50 5.12 P70 5.17 P90 5.46