On Thu, Dec 03, 2020 at 10:05:44AM +0100, mariusz.dudek@xxxxxxxxx wrote: > From: Mariusz Dudek <mariuszx.dudek@xxxxxxxxx> > > This patch series adds support for separation of eBPF program > load and xsk socket creation. In for example a Kubernetes > environment you can have an AF_XDP CNI or daemonset that is > responsible for launching pods that execute an application > using AF_XDP sockets. It is desirable that the pod runs with > as low privileges as possible, CAP_NET_RAW in this case, > and that all operations that require privileges are contained > in the CNI or daemonset. > > In this case, you have to be able separate ePBF program load from > xsk socket creation. > > Currently, this will not work with the xsk_socket__create APIs > because you need to have CAP_NET_ADMIN privileges to load eBPF > program and CAP_SYS_ADMIN privileges to create update xsk_bpf_maps. > To be exact xsk_set_bpf_maps does not need those privileges but > it takes the prog_fd and xsks_map_fd and those are known only to > process that was loading eBPF program. The api bpf_prog_get_fd_by_id > that looks up the fd of the prog using an prog_id and > bpf_map_get_fd_by_id that looks for xsks_map_fd usinb map_id both > requires CAP_SYS_ADMIN. > > With this patch, the pod can be run with CAP_NET_RAW capability > only. In case your umem is larger or equal process limit for > MEMLOCK you need either increase the limit or CAP_IPC_LOCK capability. > Without this patch in case of insufficient rights ENOPERM is > returned by xsk_socket__create. > > To resolve this privileges issue two new APIs are introduced: > - xsk_setup_xdp_prog - loads the built in XDP program. It can > also return xsks_map_fd which is needed by unprivileged > process to update xsks_map with AF_XDP socket "fd" > - xsk_sokcet__update_xskmap - inserts an AF_XDP socket into an > xskmap for a particular xsk_socket > > Usage example: > int xsk_setup_xdp_prog(int ifindex, int *xsks_map_fd) > > int xsk_socket__update_xskmap(struct xsk_socket *xsk, int xsks_map_fd); > > Inserts AF_XDP socket "fd" into the xskmap. > > The first patch introduces the new APIs. The second patch provides > a new sample applications working as control and modification to > existing xdpsock application to work with less privileges. > > This patch set is based on bpf-next commit 97306be45fbe > (Merge branch 'switch to memcg-based memory accounting') > > Since v6 > - rebase on 97306be45fbe to resolve RLIMIT conflicts Applied, Thanks