On 08/01/2025 16:38, Ihor Solodrai wrote: > On Wednesday, January 8th, 2025 at 5:55 AM, Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >> >> >> On 21/12/2024 03:04, Ihor Solodrai wrote: >> >>> In dwarf_loader with growing nr_jobs the wall-clock time of BTF >>> encoding starts worsening after a certain point [1]. >>> >>> While some overhead of additional threads is expected, it's not >>> supposed to be noticeable unless nr_jobs is set to an unreasonably big >>> value. >>> >>> It turns out when there are "too many" threads decoding DWARF, they >>> start competing for memory allocation: significant number of cycles is >>> spent in osq_lock - in the depth of malloc called within >>> cu__zalloc. Which suggests that many threads are trying to allocate >>> memory at the same time. >>> >>> See an example on a perf flamegraph for run with -j240 [2]. This is >>> 12-core machine, so the effect is small. On machines with more cores >>> this problem is worse. >>> >>> Increasing the chunk size of obstacks associated with CUs helps to >>> reduce the performance penalty caused by this race condition. >> >> >> Is this because starting with a larger obstack size means we don't have >> to keep reallocating as the obstack grows? > > Yes. Bigger obstack size leads to lower number of malloc calls. The > mallocs tend to happen at the same time between threads in the case of > DWARF decoding. > > Curiously, setting a higher obstack chunk size (like 1Mb), does not > improve the overall wall-clock time, and can even make it worse. > This happens because the kernel takes a different code path to allocate > bigger chunks of memory. And also most CUs are not big (at least in case > of vmlinux), so a bigger chunk size probably increases wasted memory. > > 128Kb seems to be close to a sweet spot for the vmlinux. > The default is 4Kb. > Thanks for the additional details! Reviewed-by: Alan Maguire <alan.maguire@xxxxxxxxxx> >> >> Thanks! >> >> Alan >> >>> [1] https://lore.kernel.org/dwarves/C82bYTvJaV4bfT15o25EsBiUvFsj5eTlm17933Hvva76CXjIcu3gvpaOCWPgeZ8g3cZ-RMa8Vp0y1o_QMR2LhPB-LEUYfZCGuCfR_HvkIP8=@pm.me/ >>> [2] https://gist.github.com/theihor/926af22417a78605fec8d85e1338920e >>> >>> Signed-off-by: Ihor Solodrai ihor.solodrai@xxxxx >>> --- >>> dwarves.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/dwarves.c b/dwarves.c >>> index 7c3e878..105f81a 100644 >>> --- a/dwarves.c >>> +++ b/dwarves.c >>> @@ -722,6 +722,8 @@ int cu__fprintf_ptr_table_stats_csv(struct cu *cu, FILE *fp) >>> return printed; >>> } >>> >>> +#define OBSTACK_CHUNK_SIZE (128*1024) >>> + >>> struct cu *cu__new(const char *name, uint8_t addr_size, >>> const unsigned char *build_id, int build_id_len, >>> const char *filename, bool use_obstack) >>> @@ -733,7 +735,7 @@ struct cu *cu__new(const char *name, uint8_t addr_size, >>> >>> cu->use_obstack = use_obstack; >>> if (cu->use_obstack) >>> - obstack_init(&cu->obstack); >>> + obstack_begin(&cu->obstack, OBSTACK_CHUNK_SIZE); >>> >>> if (name == NULL || filename == NULL) >>> goto out_free;