Re: [PATCH] Add accumulated call counter for memory allocation profiling

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Thu, 12 Sep 2024 09:12:20 -0700

On Wed, Sep 11, 2024 at 7:28 PM David Wang <00107082@xxxxxxx> wrote:
>
> At 2024-07-02 05:58:50, "Kent Overstreet" <kent.overstreet@xxxxxxxxx> wrote:
> >On Mon, Jul 01, 2024 at 10:23:32AM GMT, David Wang wrote:
> >> HI Suren,
> >>
> >> At 2024-07-01 03:33:14, "Suren Baghdasaryan" <surenb@xxxxxxxxxx> wrote:
> >> >On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@xxxxxxx> wrote:
> >> >>
> >> >> Accumulated call counter can be used to evaluate rate
> >> >> of memory allocation via delta(counters)/delta(time).
> >> >> This metrics can help analysis performance behaviours,
> >> >> e.g. tuning cache size, etc.
> >> >
> >> >Sorry for the delay, David.
> >> >IIUC with this counter you can identify the number of allocations ever
> >> >made from a specific code location. Could you please clarify the usage
> >> >a bit more? Is the goal to see which locations are the most active and
> >> >the rate at which allocations are made there? How will that
> >> >information be used?
> >>
> >> Cumulative counters can be sampled with timestamp,  say at T1, a monitoring tool got a sample value V1,
> >> then after sampling interval, at T2,  got a sample value V2. Then the average rate of allocation can be evaluated
> >> via (V2-V1)/(T2-T1). (The accuracy depends on sampling interval)
> >>
> >> This information "may" help identify where the memory allocation is unnecessary frequent,
> >> and  gain some  better performance by making less memory allocation .
> >> The performance "gain" is just a guess, I do not have a valid example.
> >
> >Easier to just run perf...
>
> Hi,
>
> To Kent:
> It is strangely odd to reply to this when I was trying to debug a performance issue for bcachefs :)
>
> Yes it is true that performance bottleneck could be identified by perf tools, but normally perf
> is not continously running (well, there are some continous profiling projects out there).
> And also, memory allocation normally is not the biggest bottleneck,
>  its impact may not easily picked up by perf.
>
> Well, in the case of https://lore.kernel.org/lkml/20240906154354.61915-1-00107082@xxxxxxx/,
> the memory allocation is picked up by perf tools though.
> But with this patch, it is easier to spot that memory allocations behavior are quite different:
> When performance were bad, the average rate for
> "fs/bcachefs/io_write.c:113 func:__bio_alloc_page_pool" was 400k/s,
> while when performance were good, rate was only less than 200/s.
>
> (I have a sample tool collecting /proc/allocinfo, and the data is stored in prometheus,
> the rate is calculated and plot via prometheus statement:
> irate(mem_profiling_count_total{file=~"fs/bcachefs.*", func="__bio_alloc_page_pool"}[5m]))
>
> Hope this could be a valid example demonstrating the usefulness of accumulative counters
> of memory allocation for performance issues.

Hi David,
I agree with Kent that this feature should be behind a kconfig flag.
We don't want to impose the overhead to the users who do not need this
feature.
Thanks,
Suren.

>
>
> Thanks
> David
>
>
>