On Sun, Aug 30, 2020 at 3:39 PM Marcus <shadowsor@xxxxxxxxx> wrote:
> On Tue, Jun 30, 2020 at 04:02:05PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Jun 30, 2020 at 12:59:03PM +0200, Miguel Duarte de Mora Barroso wrote:
> > > On Mon, Apr 6, 2020 at 4:03 PM Laine Stump <lstump redhat com> wrote:
> > > >
> > > > On 4/6/20 9:54 AM, Daniel P. Berrangé wrote:
> > > > > On Mon, Apr 06, 2020 at 03:47:01PM +0200, Miguel Duarte de Mora Barroso wrote:
> > > > >> Hi all,
> > > > >>
> > > > >> I'm aware that it is possible to plug pre-created macvtap devices to
> > > > >> libvirt guests - tracked in RFE [0].
> > > > >>
> > > > >> My interpretation of the wording in [1] and [2] is that it is also
> > > > >> possible to plug pre-created tap devices into libvirt guests - that
> > > > >> would be a requirement to allow kubevirt to run with less capabilities
> > > > >> in the pods that encapsulate the VMs.
> > > > >>
> > > > >> I took a look at the libvirt code ([3] & [4]), and, from my limited
> > > > >> understanding, I got the impression that plugging existing interfaces
> > > > >> via `managed='no' ` is only possible for macvtap interfaces.
> > > >
> > > >
> > > > No, it works for standard tap devices as well.
> > > >
> > > >
> > > > The reason the BZs and commit logs talk mostly about macvtap rather than
> > > > tap is because 1) that's what kubevirt people had asked for and 2) it
> > > > already *mostly* worked for tap devices, so most of the work was related
> > > > to macvtap (my memory is already fuzzy, but I think there were a couple
> > > > privileged operations we still tried to do for standard tap devices even
> > > > if they were precreated (standard disclaimer: I often misremember, so
> > > > this memory could be wrong! But definitely precreated tap devices do work).
> > > >
> > >
> > > It's been a while since I've started this thread, but lately I've
> > > understood better how tap devices work, and that new insight makes me
> > > wonder about a couple of things.
> > >
> > > Our ultimate goal In kubevirt is to consume a pre-created tap device
> > > by a kubernetes pod that doesn't have the NET_ADMIN capability.
> > >
> > > After looking at the current libvirt code, I don't think that is
> > > currently supported, since we'll *always* enter the
> > > `virNetDevTapCreate` function in [1] (I'm interested in the *tap*
> > > scenario).
> > >
> > > The tap device is effectively created in that function - [2] - by
> > > opening the clone device (/dev/net/tun), and calling `ioctl(fd,
> > > TUNSETIFF,...)` in it. AFAIK, both of those operations *require* the
> > > NET_ADMIN capability. If I'm correct, this means that the current
> > > libvirt implementation makes our goals impossible to achieve.
> >
> > AFAIK, that is not correct - CAP_NET_ADMIN isn't required to open
> > or create a tap device - only to add the tap device to a bridge.
> >
> > So if you create the tap device & attach it to a bridge ahead of
> > time, libvirt should then be able to open it and give it to QEMU
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/tun.c#n586
>
> ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
> (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
> !ns_capable(net->user_ns, CAP_NET_ADMIN);
>
>
> This is called by the TUNSETIFF code.
>
> AFAICT, that means if you fchown(tapfd, uid, gid), to the uid+gid of
> libvirtd, it should not require CAP_NET_ADMIN.
>
> Regards,
> Daniel I have no idea if this message will get linked into the thread properly, but I came across this and wanted to comment on the mystery without having an actual email to reply to or headers.I recently ran into this issue as well, and found that even *with* NET_ADMIN at the container level, trying to launch Qemu directly results in: qemu-system-x86_64: -netdev tap,id=hostnet0,ifname=tap0: could not configure /dev/net/tun (tap0): Permission denied So as a note I'd say even Libvirt aside, Qemu is trying to do this as well: https://github.com/qemu/qemu/blob/0982a56a551556c704dc15752dabf57b4be1c640/net/tap-linux.c#L104 But it's unclear where the EPERM is coming from in the kernel at tun_set_iff(). Of note, if I give Qemu a non-existing tap name, it will create it, but if I give it an existing tap name, I get EPERM.
That was quick - turns out this other issue is SELinux related. security_tun_dev_open, ultimately calling selinux_tun_dev_open