Re: [PATCH v3 bpf-next 0/4] Add support for cgroup bpf_link

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 30 Mar 2020 13:20:17 -0700

On Mon, Mar 30, 2020 at 7:49 AM David Ahern <dsahern@xxxxxxxxx> wrote:
>
> On 3/29/20 8:59 PM, Andrii Nakryiko wrote:
> > bpf_link abstraction itself was formalized in [0] with justifications for why
> > its semantics is a good fit for attaching BPF programs of various types. This
> > patch set adds bpf_link-based BPF program attachment mechanism for cgroup BPF
> > programs.
> >
> > Cgroup BPF link is semantically compatible with current BPF_F_ALLOW_MULTI
> > semantics of attaching cgroup BPF programs directly. Thus cgroup bpf_link can
> > co-exist with legacy BPF program multi-attachment.
> >
> > bpf_link is destroyed and automatically detached when the last open FD holding
> > the reference to bpf_link is closed. This means that by default, when the
> > process that created bpf_link exits, attached BPF program will be
> > automatically detached due to bpf_link's clean up code. Cgroup bpf_link, like
> > any other bpf_link, can be pinned in BPF FS and by those means survive the
> > exit of process that created the link. This is useful in many scenarios to
> > provide long-living BPF program attachments. Pinning also means that there
> > could be many owners of bpf_link through independent FDs.
> >
> > Additionally, auto-detachmet of cgroup bpf_link is implemented. When cgroup is
> > dying it will automatically detach all active bpf_links. This ensures that
> > cgroup clean up is not delayed due to active bpf_link even despite no chance
> > for any BPF program to be run for a given cgroup. In that sense it's similar
> > to existing behavior of dropping refcnt of attached bpf_prog. But in the case
> > of bpf_link, bpf_link is not destroyed and is still available to user as long
> > as at least one active FD is still open (or if it's pinned in BPF FS).
> >
> > There are two main cgroup-specific differences between bpf_link-based and
> > direct bpf_prog-based attachment.
> >
> > First, as opposed to direct bpf_prog attachment, cgroup itself doesn't "own"
> > bpf_link, which makes it possible to auto-clean up attached bpf_link when user
> > process abruptly exits without explicitly detaching BPF program. This makes
> > for a safe default behavior proven in BPF tracing program types. But bpf_link
> > doesn't bump cgroup->bpf.refcnt as well and because of that doesn't prevent
> > cgroup from cleaning up its BPF state.
> >
> > Second, only owners of bpf_link (those who created bpf_link in the first place
> > or obtained a new FD by opening bpf_link from BPF FS) can detach and/or update
> > it. This makes sure that no other process can accidentally remove/replace BPF
> > program.
> >
> > This patch set also implements LINK_UPDATE sub-command, which allows to
> > replace bpf_link's underlying bpf_prog, similarly to BPF_F_REPLACE flag
> > behavior for direct bpf_prog cgroup attachment. Similarly to LINK_CREATE, it
> > is supposed to be generic command for different types of bpf_links.
> >
>
> The observability piece should go in the same release as the feature.

You mean LINK_QUERY command I mentioned before? Yes, I'm working on
adding it next, regardless if this patch set goes in right now or
later.