Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc

Namhyung Kim <namhyung@xxxxxxxxxx> · Thu, 10 Oct 2024 15:56:38 -0700

On Thu, Oct 10, 2024 at 10:04:24AM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 10, 2024 at 9:46 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > On Wed, Oct 09, 2024 at 12:17:12AM -0700, Namhyung Kim wrote:
> > > On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> > > > On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > >>>
> > > > >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > >>> virtual address like virt_to_cache().  If the address is a pointer
> > > > >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > >>> NULL is returned.
> > > > >>>
> > > > >>> It doesn't grab a reference count of the kmem_cache so the caller is
> > > > >>> responsible to manage the access.  The intended use case for now is to
> > > > >>> symbolize locks in slab objects from the lock contention tracepoints.
> > > > >>>
> > > > >>> Suggested-by: Vlastimil Babka <vbabka@xxxxxxx>
> > > > >>> Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx> (mm/*)
> > > > >>> Acked-by: Vlastimil Babka <vbabka@xxxxxxx> #mm/slab
> > > > >>> Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > > >
> > > >
> > > > So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> > > > thought the perf use case was:
> > > >
> > > > - at the beginning it iterates the kmem caches and stores anything of
> > > > possible interest in bpf maps or somewhere - hence we have the iterator
> > > > - during profiling, from object it gets to a cache, but doesn't need to
> > > > access the cache - just store the kmem_cache address in the perf record
> > > > - after profiling itself, use the information in the maps from the first
> > > > step together with cache pointers from the second step to calculate
> > > > whatever is necessary
> > >
> > > Correct.
> > >
> > > >
> > > > So at no point it should be necessary to take refcount to a kmem_cache?
> > > >
> > > > But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> > > > given the above use case and it should be implemented in a way that the
> > > > pointer it returns cannot be used to access anything (which could be
> > > > unsafe), but only as a bpf map key - so it should return e.g. an
> > > > unsigned long instead?
> > >
> > > Yep, this should work for my use case.  Maybe we don't need the
> > > iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
> > > we can get the necessary info at the moment.  But I think it'd be less
> > > efficient as more work need to be done at the event (lock contention).
> > > It'd better setting up necessary info in a map before monitoring (using
> > > the iterator), and just looking up the map with the kfunc while
> > > monitoring the lock contention.
> >
> > Maybe it's still better to return a non-refcounted pointer for future
> > use.  I'll leave it for v5.
> 
> Pls keep it as:
> __bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> 
> just make sure it's PTR_UNTRUSTED.

Sure, will do.

> No need to make it return long or void *.
> The users can do:
>   bpf_core_cast(any_value, struct kmem_cache);
> anyway, but it would be an unnecessary step.

Yeah I thought there would be a way to do that.

Thanks,
Namhyung