Re: [PATCH v3 bpf-next 0/4] Add support for cgroup bpf_link

David Ahern <dsahern@xxxxxxxxx> · Mon, 30 Mar 2020 08:49:57 -0600

On 3/29/20 8:59 PM, Andrii Nakryiko wrote:
> bpf_link abstraction itself was formalized in [0] with justifications for why
> its semantics is a good fit for attaching BPF programs of various types. This
> patch set adds bpf_link-based BPF program attachment mechanism for cgroup BPF
> programs.
> 
> Cgroup BPF link is semantically compatible with current BPF_F_ALLOW_MULTI
> semantics of attaching cgroup BPF programs directly. Thus cgroup bpf_link can
> co-exist with legacy BPF program multi-attachment.
> 
> bpf_link is destroyed and automatically detached when the last open FD holding
> the reference to bpf_link is closed. This means that by default, when the
> process that created bpf_link exits, attached BPF program will be
> automatically detached due to bpf_link's clean up code. Cgroup bpf_link, like
> any other bpf_link, can be pinned in BPF FS and by those means survive the
> exit of process that created the link. This is useful in many scenarios to
> provide long-living BPF program attachments. Pinning also means that there
> could be many owners of bpf_link through independent FDs.
> 
> Additionally, auto-detachmet of cgroup bpf_link is implemented. When cgroup is
> dying it will automatically detach all active bpf_links. This ensures that
> cgroup clean up is not delayed due to active bpf_link even despite no chance
> for any BPF program to be run for a given cgroup. In that sense it's similar
> to existing behavior of dropping refcnt of attached bpf_prog. But in the case
> of bpf_link, bpf_link is not destroyed and is still available to user as long
> as at least one active FD is still open (or if it's pinned in BPF FS).
> 
> There are two main cgroup-specific differences between bpf_link-based and
> direct bpf_prog-based attachment.
> 
> First, as opposed to direct bpf_prog attachment, cgroup itself doesn't "own"
> bpf_link, which makes it possible to auto-clean up attached bpf_link when user
> process abruptly exits without explicitly detaching BPF program. This makes
> for a safe default behavior proven in BPF tracing program types. But bpf_link
> doesn't bump cgroup->bpf.refcnt as well and because of that doesn't prevent
> cgroup from cleaning up its BPF state.
> 
> Second, only owners of bpf_link (those who created bpf_link in the first place
> or obtained a new FD by opening bpf_link from BPF FS) can detach and/or update
> it. This makes sure that no other process can accidentally remove/replace BPF
> program.
> 
> This patch set also implements LINK_UPDATE sub-command, which allows to
> replace bpf_link's underlying bpf_prog, similarly to BPF_F_REPLACE flag
> behavior for direct bpf_prog cgroup attachment. Similarly to LINK_CREATE, it
> is supposed to be generic command for different types of bpf_links.
> 

The observability piece should go in the same release as the feature.