Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 08, 2024 at 01:00:59PM +0100, Alan Maguire wrote:
> On 04/04/2024 09:58, Alan Maguire wrote:
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
> >     at /home/almagui/src/dwarves/dwarves.c:612
> > 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> > (gdb) print *(struct ptr_table *)0x7f8c60001e70
> > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096}
> > (gdb)

> > So it looks like the ptr_table has 2979 entries but entries is NULL;
> > could there be an issue where CU initialization is not yet complete
> > for some threads (it also happens very early in processing)? Can you
> > reproduce this failure at your end? Thanks!
 
> the following (when applied on top of the series) resolves the
> segmentation fault for me:
 
> diff --git a/pahole.c b/pahole.c
> index 6c7e738..5ff0eaf 100644
> --- a/pahole.c
> +++ b/pahole.c
> @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct
> cu *cu,
>                 if (conf_load->reproducible_build) {
>                         ret = LSK__KEEPIT; // we're not processing the
> cu passed to this function, so keep it.
> -                        // Equivalent to LSK__DELETE since we processed
> this
> -                       cus__remove(cus, cu);
> -                       cu__delete(cu);
>                 }
>  out_btf:
>                 if (!thr_data) // See comment about reproducibe_build above
> 

Yeah, Jiri also pointed out this call to cu__delete() was new, I was
trying to avoid having unprocessed 'struct cu' using too much memory, so
after processing it, delete them, but as you found out there are
references to that memory...

> In other words, the problem is we remove/delete CUs when finished with
> them in each thread (when BTF is generated).  However because the
> save/add_saved_funcs stashes CU references in the associated struct
> function * (to allow prototype comparison for the same function in
> different CUs), we end up with stale CU references and in this case the
> freed/nulled ptr_table caused an issue. As far as I can see we need to
> retain CUs until all BTF has been merged from threads.
 
> With the fix in place, I'm seeing less then 100msec difference between
> reproducible/non-reproducible vmlinux BTF generation; that's great!

Excellent!

I'll remove it and add a note crediting you with the removal and having
the explanation about why its not possibe to delete it at that point
(references to the associated 'struct function').

Perhaps we can save this info in some other way that allows us to free
the CU after having it processed, I'll think about it.

But its good to see that the difference is small, great!

Thanks a lot!

- Arnaldo




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux