Re: [PATCH 0/2] Introduce panic function when slub leaks

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Fri, 27 Sep 2024 17:01:37 +0900

On Fri, Sep 27, 2024 at 4:28 PM zhang fangzheng
<fangzheng.zhang1003@xxxxxxxxx> wrote:
>
> On Thu, Sep 26, 2024 at 8:30 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >
> > On 9/25/24 15:18, Hyeonggon Yoo wrote:
> > > On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang
> > > <fangzheng.zhang@xxxxxxxxxx> wrote:
> > >>
> > >> Hi all,
> > >
> > > Hi Fangzheng,
> > >
> > >> A method to detect slub leaks by monitoring its usage in real time
> > >> on the page allocation path of the slub. When the slub occupancy
> > >> exceeds the user-set value, it is considered that the slub is leaking
> > >> at this time
> > >
> > > I'm not sure why this should be a kernel feature. Why not write a user
> > > script that parses
> > > MemTotal: and Slab: part of /proc/meminfo file and generates a log
> > > entry or an alarm?
> >
> > Yes very much agreed. It seems rather arbitrary. Why slab, why not any other
> > kernel-specific counter in /proc/meminfo? Why include NR_SLAB_RECLAIMABLE_B
> > when that's used by caches with shrinkers?
>
> Ok, this is because the current consideration is to specifically
> track the memory usage of the slab module.
> In the stability test, ie, monkey test,
> the anr or reboot problem occurs, there is a high probability
> that the slab occupancy is high when it comes to memory analysis.
> In addition to directly monitoring leaks in the allocation path, it is
> also convenient to record the allocation stack information
> when an exception occurs.

[+Cc Memory Allocation Profiling maintainers]

For recording allocation information, I think CONFIG_MEM_ALLOC_PROFILING [1] [2]
may be used to track allocation sites that contribute to memory leaks,
instead of making the kernel panic or printing WARNING?

.....Or with higher overhead, slub_debug=U [3] if it is not meant to
be run on production.

[1] https://docs.kernel.org/mm/allocation-profiling.html
[2] https://lwn.net/Articles/974380
[3] https://docs.kernel.org/mm/slub.html#debugfs-files-for-slub

Best,
Hyeonggon

> > A userspace solution should be straightforward and universal - easily
> > configurable for different scenarios.
> >
> > >> and a panic operation will be triggered immediately.
> > >
> > > I don't think it would be a good idea to panic unnecessarily.
> > > IMO it is not proper to panic when the kernel can still run.
> >
> > Yes these days it's practically impossible to add a BUG_ON() for more
> > serious conditions than this.
> >
> > Please don't post new versions addressing specific implementation details
> > until this fundamental issue is addressed.
> >
> > Thanks,
> > Vlastimil
> >
> > > Any thoughts?
> > >
> > > Thanks,
> > > Hyeonggon
> >