Hello, On Thu, 7 Mar 2024, Michael Weiß wrote: > Configuring ipvs in a non-initial user namespace using the genl > netlink interface, e.g., by 'ipvsadm' is currently resulting in an > '-EPERM'. This is due to the use of GENL_ADMIN_PERM flag in > 'ip_vs_ctl.c'. > > Similarly to other genl interfaces, we switch to the use of > GENL_UNS_ADMIN_PERM flag which allows connection from non-initial > user namespace. Thus, it would be feasible to configure ipvs using > the genl interface also from within an unprivileged system container. > > Since adding of new services and new dests are triggered from > userspace, accounting for the corresponding memory allocations in > ip_vs_new_dest() and ip_vs_add_service() is activated. > > We tested this by simply running some samples from "man ipvsadm" > within an unprivileged user namespaced system container in GyroidOS. > Further, we successfully passed an adapted version of the ipvs > selftest in 'tools/testing/selftests/netfilter/ipvs.sh' using > preliminary created network namespaces from unprivileged GyroidOS > containers. I planned such change but as followup patchset to other work which converts many structures to be per-netns. There is a RFC v2 patchset for reference: https://archive.linuxvirtualserver.org/html/lvs-devel/2023-12/index.html My goal was to isolate the different namespaces as much as possible: different structures, different kthreads, etc. with the goal to reduce the security risks of giving power to unprivileged roots. Such isolation should help when namespaces are served from different CPUs. May be I should push fresh v3 soon, so that we can later use GFP_KERNEL_ACCOUNT not only for services and dests but also for allocations by schedulers, estimators, etc. The access to sysctl vars should be enabled too, around comment "Don't export sysctls to unprivileged users", alloc_percpu => alloc_percpu_gfp(,GFP_KERNEL_ACCOUNT), SLAB_ACCOUNT for kmem_cache_create, not sure about __GFP_NOWARN and __GFP_NORETRY usage too. Not sure about the sysctl vars: now they are cloned from init_net, do we give full access for writing, some can be privileged, etc. I didn't push such changes yet because I'm not sure what is needed: looks like, for now, what was needed is root from init_net to control rules in different netns and there was no demand from the virtualization world to extend this. If we can clearly define what is good and what is bad from security perspective, we can go with such changes after pushing the above patchset, i.e. the GENL_UNS_ADMIN_PERM change should follow all other changes. > Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx> > --- > net/netfilter/ipvs/ip_vs_ctl.c | 36 +++++++++++++++++----------------- > 1 file changed, 18 insertions(+), 18 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > index 143a341bbc0a..d39120c64207 100644 > --- a/net/netfilter/ipvs/ip_vs_ctl.c > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > @@ -1080,7 +1080,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) > return -EINVAL; > } > > - dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL); > + dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL_ACCOUNT); > if (dest == NULL) > return -ENOMEM; > > @@ -1421,7 +1421,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u, > ret_hooks = ret; > } > > - svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL); > + svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL_ACCOUNT); > if (svc == NULL) { > IP_VS_DBG(1, "%s(): no memory\n", __func__); > ret = -ENOMEM; > @@ -4139,98 +4139,98 @@ static const struct genl_small_ops ip_vs_genl_ops[] = { > { > .cmd = IPVS_CMD_NEW_SERVICE, > .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, > - .flags = GENL_ADMIN_PERM, > + .flags = GENL_UNS_ADMIN_PERM, > .doit = ip_vs_genl_set_cmd, ... Regards -- Julian Anastasov <ja@xxxxxx>