On 3/8/24 08:55, Julian Anastasov wrote: > > Hello, > > On Thu, 7 Mar 2024, Michael Weiß wrote: > >> Configuring ipvs in a non-initial user namespace using the genl >> netlink interface, e.g., by 'ipvsadm' is currently resulting in an >> '-EPERM'. This is due to the use of GENL_ADMIN_PERM flag in >> 'ip_vs_ctl.c'. >> >> Similarly to other genl interfaces, we switch to the use of >> GENL_UNS_ADMIN_PERM flag which allows connection from non-initial >> user namespace. Thus, it would be feasible to configure ipvs using >> the genl interface also from within an unprivileged system container. >> >> Since adding of new services and new dests are triggered from >> userspace, accounting for the corresponding memory allocations in >> ip_vs_new_dest() and ip_vs_add_service() is activated. >> >> We tested this by simply running some samples from "man ipvsadm" >> within an unprivileged user namespaced system container in GyroidOS. >> Further, we successfully passed an adapted version of the ipvs >> selftest in 'tools/testing/selftests/netfilter/ipvs.sh' using >> preliminary created network namespaces from unprivileged GyroidOS >> containers. > > I planned such change but as followup patchset to other > work which converts many structures to be per-netns. > > There is a RFC v2 patchset for reference: > > https://archive.linuxvirtualserver.org/html/lvs-devel/2023-12/index.html > > My goal was to isolate the different namespaces as much as > possible: different structures, different kthreads, etc. with the > goal to reduce the security risks of giving power to unprivileged roots. > Such isolation should help when namespaces are served from different CPUs. > > May be I should push fresh v3 soon, so that we can later use > GFP_KERNEL_ACCOUNT not only for services and dests but also > for allocations by schedulers, estimators, etc. The access to > sysctl vars should be enabled too, around comment > "Don't export sysctls to unprivileged users", > alloc_percpu => alloc_percpu_gfp(,GFP_KERNEL_ACCOUNT), > SLAB_ACCOUNT for kmem_cache_create, not sure about __GFP_NOWARN and > __GFP_NORETRY usage too. > > Not sure about the sysctl vars: now they are cloned from > init_net, do we give full access for writing, some can be privileged, > etc. Oh yes your right that I have missed. I think non-global per-netns sysctls should be save to be allowed for unprivileged roots. sysfs can only be mounted rw in a new private netns. Just unsharing the userns which then does not own the netns, means that sysfs can only be mounted read-only or the already mounted sysfs gets the overflow uids. > > I didn't push such changes yet because I'm not sure what > is needed: looks like, for now, what was needed is root from init_net to > control rules in different netns and there was no demand from the > virtualization world to extend this. If we can clearly define what is > good and what is bad from security perspective, we can go with such > changes after pushing the above patchset, i.e. the GENL_UNS_ADMIN_PERM > change should follow all other changes. Just to clarify the need to control ipvs from within non-init userns: In GyroidOS, we have only a minimal ramdisk-based system containing a container manager in the initial namespace. A first "core" container is already running in an unprivileged user namespace with a private netns. This container gets all physical network devices moved into its netns and is responsible for routing and all other networking features, ovs, batman, l2tp, etc... and also ipvs. Nevertheless, it makes sense to wait for your changes to be merged first. Thanx, Michael > >> Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx> >> --- >> net/netfilter/ipvs/ip_vs_ctl.c | 36 +++++++++++++++++----------------- >> 1 file changed, 18 insertions(+), 18 deletions(-) >> >> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c >> index 143a341bbc0a..d39120c64207 100644 >> --- a/net/netfilter/ipvs/ip_vs_ctl.c >> +++ b/net/netfilter/ipvs/ip_vs_ctl.c >> @@ -1080,7 +1080,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) >> return -EINVAL; >> } >> >> - dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL); >> + dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL_ACCOUNT); >> if (dest == NULL) >> return -ENOMEM; >> >> @@ -1421,7 +1421,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u, >> ret_hooks = ret; >> } >> >> - svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL); >> + svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL_ACCOUNT); >> if (svc == NULL) { >> IP_VS_DBG(1, "%s(): no memory\n", __func__); >> ret = -ENOMEM; >> @@ -4139,98 +4139,98 @@ static const struct genl_small_ops ip_vs_genl_ops[] = { >> { >> .cmd = IPVS_CMD_NEW_SERVICE, >> .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, >> - .flags = GENL_ADMIN_PERM, >> + .flags = GENL_UNS_ADMIN_PERM, >> .doit = ip_vs_genl_set_cmd, > ... > > Regards > > -- > Julian Anastasov <ja@xxxxxx>