Re: [PATCH RFC bpf-next v2 5/5] selftests/bpf: test for pinning for cgroup_view link

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 14 Feb 2022 12:27:54 -0800

On Mon, Feb 14, 2022 at 12:23 PM Hao Luo <haoluo@xxxxxxxxxx> wrote:
>
> On Mon, Feb 14, 2022 at 11:25 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Mon, Feb 14, 2022 at 10:29 AM Hao Luo <haoluo@xxxxxxxxxx> wrote:
> > > Hi Alexei,
> > >
> > > Actually, I found this almost worked, except that the tracepoints
> > > cgroup_mkdir and cgroup_rmdir are not sleepable. They are inside a
> > > spinlock's critical section with irq off. I guess one solution is to
> > > offload the sleepable part of the bpf prog into a thread context. We
> > > may create a dedicated kernel thread or use workqueue for this. Do you
> > > have any advice?
> >
> > Are you referring to spin_lock in TRACE_CGROUP_PATH
> > that protects global trace_cgroup_path[] buffer?
>
> Yes, that's the spin_lock I am talking about.
>
> > That is fixable.
> > Do you actually need the string path returned by cgroup_path() in bpf prog?
> > Maybe prog can call cgroup_path() by itself when necessary.
> > Parsing strings isn't great anyway. The bpf prog probably needs the last
> > part of the dir only. So cgrp->kn->name would do it?
> > The TRACE_CGROUP_PATH wasn't designed to be turned on 24/7.
> > That global spin_lock is not great for production use.
> > No need to delegate sleepable bpf to thread context.
> > Let's refactor that tracepoint a bit.
>
> No, we don't need cgroup_path(). We are going to name the directories
> by cgrp->kn->id. I can add a fast version for cgroup_xxx tracepoints,
> which don't require the full path and can be turned on 24/7.

Sounds good. We need a flag for tracepoints anyway to indicate
which ones are sleepable.
Probably similar to what we did for DEFINE_EVENT_WRITABLE.