On Wednesday, January 8th, 2025 at 5:55 AM, Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > > On 21/12/2024 03:04, Ihor Solodrai wrote: > > > In dwarf_loader with growing nr_jobs the wall-clock time of BTF > > encoding starts worsening after a certain point [1]. > > > > While some overhead of additional threads is expected, it's not > > supposed to be noticeable unless nr_jobs is set to an unreasonably big > > value. > > > > It turns out when there are "too many" threads decoding DWARF, they > > start competing for memory allocation: significant number of cycles is > > spent in osq_lock - in the depth of malloc called within > > cu__zalloc. Which suggests that many threads are trying to allocate > > memory at the same time. > > > > See an example on a perf flamegraph for run with -j240 [2]. This is > > 12-core machine, so the effect is small. On machines with more cores > > this problem is worse. > > > > Increasing the chunk size of obstacks associated with CUs helps to > > reduce the performance penalty caused by this race condition. > > > Is this because starting with a larger obstack size means we don't have > to keep reallocating as the obstack grows? Yes. Bigger obstack size leads to lower number of malloc calls. The mallocs tend to happen at the same time between threads in the case of DWARF decoding. Curiously, setting a higher obstack chunk size (like 1Mb), does not improve the overall wall-clock time, and can even make it worse. This happens because the kernel takes a different code path to allocate bigger chunks of memory. And also most CUs are not big (at least in case of vmlinux), so a bigger chunk size probably increases wasted memory. 128Kb seems to be close to a sweet spot for the vmlinux. The default is 4Kb. > > Thanks! > > Alan > > > [1] https://lore.kernel.org/dwarves/C82bYTvJaV4bfT15o25EsBiUvFsj5eTlm17933Hvva76CXjIcu3gvpaOCWPgeZ8g3cZ-RMa8Vp0y1o_QMR2LhPB-LEUYfZCGuCfR_HvkIP8=@pm.me/ > > [2] https://gist.github.com/theihor/926af22417a78605fec8d85e1338920e > > > > Signed-off-by: Ihor Solodrai ihor.solodrai@xxxxx > > --- > > dwarves.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/dwarves.c b/dwarves.c > > index 7c3e878..105f81a 100644 > > --- a/dwarves.c > > +++ b/dwarves.c > > @@ -722,6 +722,8 @@ int cu__fprintf_ptr_table_stats_csv(struct cu *cu, FILE *fp) > > return printed; > > } > > > > +#define OBSTACK_CHUNK_SIZE (128*1024) > > + > > struct cu *cu__new(const char *name, uint8_t addr_size, > > const unsigned char *build_id, int build_id_len, > > const char *filename, bool use_obstack) > > @@ -733,7 +735,7 @@ struct cu *cu__new(const char *name, uint8_t addr_size, > > > > cu->use_obstack = use_obstack; > > if (cu->use_obstack) > > - obstack_init(&cu->obstack); > > + obstack_begin(&cu->obstack, OBSTACK_CHUNK_SIZE); > > > > if (name == NULL || filename == NULL) > > goto out_free;