Re: [PATCH dwarves v2 10/10] dwarf_loader: multithreading with a job/worker model

Ihor Solodrai <ihor.solodrai@xxxxx> · Thu, 19 Dec 2024 19:31:35 +0000

On Thursday, December 19th, 2024 at 6:59 AM, Jiri Olsa <olsajiri@xxxxxxxxx> wrote:

> 
> 
> On Fri, Dec 13, 2024 at 10:37:34PM +0000, Ihor Solodrai wrote:
> 
> SNIP
> 
> > +static void *dwarf_loader__worker_thread(void *arg)
> > +{
> > + struct cu_processing_job *job;
> > + struct dwarf_cus *dcus = arg;
> > + bool stop = false;
> > + struct cu cu;
> > +
> > + while (!stop) {
> > + job = cus_queue__dequeue_job();
> > +
> > + switch (job->type) {
> > +
> > + case JOB_DECODE:
> > + cu = dwarf_loader__decode_next_cu(dcus);
> > +
> > + if (cu == NULL) {
> > + free(job);
> > + stop = true;
> > + break;
> > + }
> > +
> > + / Create and enqueue a new JOB_STEAL for this decoded CU */
> > + struct cu_processing_job *steal_job = calloc(1, sizeof(*steal_job));
> 
> 
> missing steal_job != NULL check

In the next version, job objects are allocated only by the main thread
and are reused when enqueued [1].

> 
> SNIP
> 
> > -static int dwarf_cus__threaded_process_cus(struct dwarf_cus *dcus)
> > +static int dwarf_cus__process_cus(struct dwarf_cus *dcus)
> > {
> > - pthread_t threads[dcus->conf->nr_jobs];
> > - struct dwarf_thread dthr[dcus->conf->nr_jobs];
> > - void *thread_data[dcus->conf->nr_jobs];
> > - int res;
> > - int i;
> > + int nr_workers = dcus->conf->nr_jobs > 0 ? dcus->conf->nr_jobs : 1;
> > + pthread_t workers[nr_workers];
> > + struct cu_processing_job *job;
> > 
> > - if (dcus->conf->threads_prepare) {
> > - res = dcus->conf->threads_prepare(dcus->conf, dcus->conf->nr_jobs, thread_data);
> > - if (res != 0)
> > - return res;
> > - } else {
> > - memset(thread_data, 0, sizeof(void *) * dcus->conf->nr_jobs);
> > + cus_queue__init(nr_workers * 4);
> 
> 
> why '* 4' ?

This is an arbitrary limit, described in comments.

If we allow the workers to pick up next cu for decoding as soon as
it's ready, then the memory usage may greatly increase, if the stealer
can't keep up with incoming work.

If we want to avoid this there needs to be a limit on how many
decoded, but not yet stolen, CUs we allow to hold in memory. When
this limit is reached the workers will wait for more CUs to get
stolen.

N x 4 is a number I picked after trying various values and looking at
the resulting memory usage.

We could make it configurable, but this value doesn't look to me as a
reasonable user-facing option. Maybe we could add "I don't care about
memory usage" flag to pahole? wdyt?

> 
> > +
> > + /* fill up the queue with nr_workers JOB_DECODE jobs */
> > + for (int i = 0; i < nr_workers; i++) {
> > + job = calloc(1, sizeof(*job));
> 
> 
> missing job != NULL check
> 
> > + job->type = JOB_DECODE;
> > + /* no need for locks, workers were not started yet */
> > + list_add(&job->node, &cus_processing_queue.jobs);
> > }
> > 
> > - for (i = 0; i < dcus->conf->nr_jobs; ++i) {
> > - dthr[i].dcus = dcus;
> > - dthr[i].data = thread_data[i];
> > + if (dcus->error)
> > + return dcus->error;
> > 
> > - dcus->error = pthread_create(&threads[i], NULL,
> > - dwarf_cus__process_cu_thread,
> > - &dthr[i]);
> > + for (int i = 0; i < nr_workers; ++i) {
> > + dcus->error = pthread_create(&workers[i], NULL,
> > + dwarf_loader__worker_thread,
> > + dcus);
> > if (dcus->error)
> > goto out_join;
> > }
> > @@ -3596,54 +3766,19 @@ static int dwarf_cus__threaded_process_cus(struct dwarf_cus *dcus)
> > dcus->error = 0;
> > 
> > out_join:
> > - while (--i >= 0) {
> > + for (int i = 0; i < nr_workers; ++i) {
> 
> 
> I think you should keep the original while loop to cleanup/wait only for
> threads that we actually created

Do you mean in case of an error from pthread_create? Ok.

> 
> > void *res;
> > - int err = pthread_join(threads[i], &res);
> > + int err = pthread_join(workers[i], &res);
> > 
> > if (err == 0 && res != NULL)
> > dcus->error = (long)res;
> > }
> > 
> > - if (dcus->conf->threads_collect) {
> > - res = dcus->conf->threads_collect(dcus->conf, dcus->conf->nr_jobs,
> > - thread_data, dcus->error);
> > - if (dcus->error == 0)
> > - dcus->error = res;
> > - }
> > + cus_queue__destroy();
> > 
> > return dcus->error;
> > }
> 
> 
> SNIP

[1] https://github.com/acmel/dwarves/commit/5278adbe5cb796c7baafb110d8c5cda107ec9d68#diff-77fe7eedce76594a6c71363b22832723fc086a8e72debd0370763e4193704b1eR3706-R3708