On Wed, Dec 18, 2019 at 11:45 PM Andrey Ignatov <rdna@xxxxxx> wrote: > > The common use-case in production is to have multiple cgroup-bpf > programs per attach type that cover multiple use-cases. Such programs > are attached with BPF_F_ALLOW_MULTI and can be maintained by different > people. > > Order of programs usually matters, for example imagine two egress > programs: the first one drops packets and the second one counts packets. > If they're swapped the result of counting program will be different. > > It brings operational challenges with updating cgroup-bpf program(s) > attached with BPF_F_ALLOW_MULTI since there is no way to replace a > program: > > * One way to update is to detach all programs first and then attach the > new version(s) again in the right order. This introduces an > interruption in the work a program is doing and may not be acceptable > (e.g. if it's egress firewall); > > * Another way is attach the new version of a program first and only then > detach the old version. This introduces the time interval when two > versions of same program are working, what may not be acceptable if a > program is not idempotent. It also imposes additional burden on > program developers to make sure that two versions of their program can > co-exist. > > Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH > command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI > flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag > and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to > replace > > That way user can replace any program among those attached with > BPF_F_ALLOW_MULTI flag without the problems described above. > > Details of the new API: > > * If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid > descriptor of BPF program, BPF_PROG_ATTACH will return corresponding > error (EINVAL or EBADF). > > * If replace_bpf_fd has valid descriptor of BPF program but such a > program is not attached to specified cgroup, BPF_PROG_ATTACH will > return ENOENT. > > BPF_F_REPLACE is introduced to make the user intent clear, since > replace_bpf_fd alone can't be used for this (its default value, 0, is a > valid fd). BPF_F_REPLACE also makes it possible to extend the API in the > future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed). > > Signed-off-by: Andrey Ignatov <rdna@xxxxxx> > Acked-by: Martin KaFai Lau <kafai@xxxxxx> > --- Looks good. Acked-by: Andrii Narkyiko <andriin@xxxxxx> > include/linux/bpf-cgroup.h | 4 +++- > include/uapi/linux/bpf.h | 10 ++++++++++ > kernel/bpf/cgroup.c | 30 ++++++++++++++++++++++++++---- > kernel/bpf/syscall.c | 4 ++-- > kernel/cgroup/cgroup.c | 5 +++-- > tools/include/uapi/linux/bpf.h | 10 ++++++++++ > 6 files changed, 54 insertions(+), 9 deletions(-) > [...]