Re: [PATCH dwarves] dwarves: set cu->obstack chunk size to 128Kb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 08, 2025 at 05:22:56PM +0000, Alan Maguire wrote:
> On 08/01/2025 16:38, Ihor Solodrai wrote:
> > On Wednesday, January 8th, 2025 at 5:55 AM, Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> > 
> >>
> >>
> >> On 21/12/2024 03:04, Ihor Solodrai wrote:
> >>
> >>> In dwarf_loader with growing nr_jobs the wall-clock time of BTF
> >>> encoding starts worsening after a certain point [1].
> >>>
> >>> While some overhead of additional threads is expected, it's not
> >>> supposed to be noticeable unless nr_jobs is set to an unreasonably big
> >>> value.
> >>>
> >>> It turns out when there are "too many" threads decoding DWARF, they
> >>> start competing for memory allocation: significant number of cycles is
> >>> spent in osq_lock - in the depth of malloc called within
> >>> cu__zalloc. Which suggests that many threads are trying to allocate
> >>> memory at the same time.
> >>>
> >>> See an example on a perf flamegraph for run with -j240 [2]. This is
> >>> 12-core machine, so the effect is small. On machines with more cores
> >>> this problem is worse.
> >>>
> >>> Increasing the chunk size of obstacks associated with CUs helps to
> >>> reduce the performance penalty caused by this race condition.
> >>
> >>
> >> Is this because starting with a larger obstack size means we don't have
> >> to keep reallocating as the obstack grows?
> > 
> > Yes. Bigger obstack size leads to lower number of malloc calls. The
> > mallocs tend to happen at the same time between threads in the case of
> > DWARF decoding.
> > 
> > Curiously, setting a higher obstack chunk size (like 1Mb), does not
> > improve the overall wall-clock time, and can even make it worse.
> > This happens because the kernel takes a different code path to allocate
> > bigger chunks of memory. And also most CUs are not big (at least in case
> > of vmlinux), so a bigger chunk size probably increases wasted memory.
> > 
> > 128Kb seems to be close to a sweet spot for the vmlinux.
> > The default is 4Kb.
> >
> 
> Thanks for the additional details!
> 
> Reviewed-by: Alan Maguire <alan.maguire@xxxxxxxxxx>

I'm adding these details and your reviewed-by tag to that cset.

Thanks!

- Arnaldo




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux