On Mon, Jul 09, 2018 at 04:56:04PM -0400, Jason Baron wrote: > > > On 07/05/2018 12:10 PM, Daniel P. Berrangé wrote: > > On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote: > >> Hi, > >> > >> Opening tap devices, such as macvtap, that are created in containers is > >> problematic because the interface for opening tap devices is via > >> /dev/tapNN and devtmpfs is not typically mounted inside a container as > >> its not namespace aware. It is possible to do a mknod() in the > >> container, once the tap devices are created, however, since the tap > >> devices are created dynamically its not possible to apriori allow access > >> to certain major/minor numbers, since we don't know what these are going > >> to be. In addition, its desirable to not allow the mknod capability in > >> containers. This behavior, I think is somewhat inconsistent with the > >> tuntap driver where one can create tuntap devices inside a container by > >> first opening /dev/net/tun and then using them by supplying the tuntap > >> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the > >> network namespace, one is limited to opening network devices that belong > >> to your current network namespace. > >> > >> Here are some options to this issue, that I wanted to get feedback > >> about, and just wondering if anybody else has run into this. > >> > >> 1) > >> > >> Don't create the tap device, such as macvtap in the container. Instead, > >> create the tap device outside of the container and then move it into the > >> desired container network namespace. In addition, do a mknod() for the > >> corresponding /dev/tapNN device from outside the container before doing > >> chroot(). > >> > >> This solution still doesn't allow tap devices to be created inside the > >> container. Thus, in the case of kubevirt, which runs libvirtd inside of > >> a container, it would mean changing libvirtd to open existing tap > >> devices (as opposed to the current behavior of creating new ones). This > >> would not require any kernel changes, but as mentioned seems > >> inconsistent with the tuntap interface. > > > > Presumably the /dev/tapNN device name also changes when you rename > > the tap device interface using SIOCSIFNAME ? > > > > I don't think so. the NN is the ifindex of the device- changing the > device name does not affect the ifindex. Ah right that makes sense. > > eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0) > > when moving it into the container, it would be /dev/eth0 inside the > > container ? > > > > When moving it into the container the ifindex can change since the > ifindex range is per-namespace (not global). Oh thats interesting, I hadn't realized that. > > Anyway, given that this /dev/tapNN approach is what exists today, > > libvirt will likely want to implement support for this regardless > > in order to support existing kernels. > > Ok, in this case whatever created the tap device outside of the > container would pass the name of the device to libvirt and make sure > that the /dev/tapNN device was setup correctly in the container. I > believe this differs from how libvirt works today in that libvirt would > need to be modified to open an existing device (I think it currently > always creates new ones). Libvirt can use a pre-created TAP device today, but not a pre-created MACVTAP, so supporting the latter is new code for us no matter what. > > One slight complication with either of the solutions above is that > > libvirt won't know whether it is given a TAP or a MACVTAP device. > > It'll only be given the device name. So with code today we would > > probably have to first try /dev/tapNNN and if that doesn't exist > > then try /dev/net/tun with TUNSETIFF. > > > > hmmm. doesn't libvirt make this distinction today? No need to make the distinction yet, since we only support pre-created TAP devices right now. In cases where we create the devices ourselves, we already know what is what. > > If adding a new /dev/net/tap, something could seemlessy accept > > either a TAP or MACTAP nic name would be nice. > > > > > > I think if we added a new ioctl() as I proposed it could accept either > type of nic. ok that would be nice. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list