Re: [PATCH v3 bpf-next 0/4] Add support for cgroup bpf_link

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 31 Mar 2020 17:45:27 -0700

On Tue, Mar 31, 2020 at 3:44 PM David Ahern <dsahern@xxxxxxxxx> wrote:
>
> On 3/31/20 3:51 PM, Edward Cree wrote:
> > On 31/03/2020 04:54, Andrii Nakryiko wrote:
> >> No need to kill random processes, you can kill only those that hold
> >> bpf_link FD. You can find them using drgn tool with script like [0].
> > For the record, I find the argument "we don't need a query feature,
> >  because you can just use a kernel debugger" *utterly* *horrifying*.
> > Now, it seems to be moot, because Alexei has given other, better
> >  reasons why query doesn't need to land yet; but can we please not
> >  ever treat debugging interfaces as a substitute for proper APIs?
> >
> > </scream>
> > -ed
> >
>
> just about to send the same intent. Dev packages and processing
> /proc/kcore is not a proper observability API for production systems.

I'm not against observability. LINK_QUERY is going to be added. I'm
also looking into making bpf_link into "lookup-able by id" object,
similar to bpf_map and bpf_prog, which will allow to easily just say
"show me all the BPF attachments in the system", which is impossible
to do right now, btw.

As for the drgn and /proc/kcore. drgn is an awesome tool to do lots of
inner kernel API observability stuff, which is impractical to expose
through stable APIs. But you don't have to use it to get the same
effect. The problem that script is solving is to show all the
processes that have open FD to bpf_link files. This is the same
problem fuser command is solving for normal files, but solution is
similar. fuser seems to be doing it iterating over all processes and
its FDs in procfs. Not the most efficient way, but it works. Here's
what you can get for cgroup bpf_link file with my last patch set
already:

# cat /proc/1366584/fdinfo/14
pos:    0
flags:  02000000
mnt_id: 14
link_type:      cgroup
prog_tag:       9ad187367cf2b9e8
prog_id:        1649

We can extend that information further with relevant details. This is
a good and bigger discussion for LINK_QUERY API as well, given it and
fdinfo might be treated as two ways to get same information. This is
one reason I didn't do it for cgroup bpf_link, there are already
enough related discussions to keep us all involved for more than a
week now.

But it would be nice to start discussing and figuring out these
relevant details, instead of being horrified and terrified, and
spreading FUD. Or inventing ways to violate good properties of
bpf_link (e.g., by forceful nuking) due to theoretical worries about
the need to detach bpf_link without finding application or pinned file
that holds it. As Alexei mentioned, what's there already (raw_tp,
tracing, and now cgroup bpf_links) is no worse than what we had
before. By the time we get to XDP bpf_link, we'll have even more
observability capabilities.