Re: [libvirt PATCH 0/3] network: support NAT with IPv6

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Tue, 9 Jun 2020 17:27:39 +0100

On Mon, Jun 08, 2020 at 11:05:00PM -0400, Laine Stump wrote:
> On 6/8/20 10:51 AM, Daniel P. Berrangé wrote:
> > The virtual network has never supported NAT with IPv6 since this feature
> > didn't exist at the time. NAT has been available since RHEL-7 vintage
> > though, and it is desirable to be able to use it.
> > 
> > This series enables it with
> > 
> >    <forward mode=3D"nat">
> >      <nat ipv6=3D"yes"/>
> >    </forward>
> 
> I've had this lurking on my "this is something I should do" list for a long
> time, but couldn't decide on the best name in XML (and also figured that the
> problem with accept_ra needed to be fixed first), so it never got to the
> top. So I'm glad to see you've done it, disappointed in myself that I never
> did it :-/
> 
> I like your XML knob naming better than what I'd considered. I had thought
> of having <forward mode='supernat'> (or some other more reasonable extra
> mode), but your proposal is more orthogonal and matches with the existing
> ipv6='yes' at the toplevel of <network> (which is used to enable ipv6
> traffic between guests on the bridge even when there are no IPv6 addresses
> configured for the network.)

I considered  mode="nat6" as an alternative, but it would have meant
updating many switch() statements, and is a somewhat misleading as a
name. 

> >    </network>
> > 
> > Conceptually this means
> > 
> >   - Try to gimme a subnet with IPv4 and DHCP
> >   - Try to gimme a subnet with IPv6 and RAs
> > 
> > Now when we start the virtual network
> > 
> >   - If IPv4 is not enabled on host, don't assign addr
> 
> What will we use to check for this? Not just "no IP addresses configured", I
> guess, since it may be the case that libvirt has just happened to come up
> before NM or whoever has started any networks. (or maybe someone wants to
> use IPv6 on a libvirt virtual network, but have no IPv6 connectivity beyond
> the host).

IIUC, we can simply check whether it is possible to create a socket
with AF_INET or AF_INET6.  If the kernel supports it, then this
should suceed, even if network manager isn't running yet.

> >   - Else
> >     - Iterate N=3D1..254 to find a free range for IPv4
> >     - Use 192.168.N.0/24 for subnet
> >     - Use 192.168.N.1 for host IP
> >     - Use 192.168.N.2 -> 192.168.N.254 for guest DHCP
> > 
> >   - If IPv6 is not enabled on host, don't assign addr
> >   - Else
> >     - Generate NNNN:NNNN as 4 random bytes
> >     - Use fd00:add:f00d:NNNN:NNNN::0/64 for IPv6 subnet
> >     - Use fd00:add:f00d:NNNN:NNNN::1 for host IP
> >     - Use route advertizement for IPv6 zero-conf
> > 
> > With NNNN:NNNN, even with 1000 guests running, we have just a 0.02%
> > chance of clashing with a guest for IPv6.
> > 
> > The "live" XML would always reflect the currently assigned addresses
> > 
> > Proactively monitor the address allocations of the host. If we see
> > a conflicting address appear, take down the dnsmasq intance, generate
> > a new subnet, bring dnsmasq back online.
> 
> Hmm. How would you see this monitoring happening? We couldn't do it with an
> external script like I had done for simple "shut down on conflict" without
> adding extra functionality to libvirt's network driver. We *could* go back
> to the idea of monitoring netlink change messages ourselves within libvirtd
> and doing it all internally ourselves. Or maybe the NM script I proposed
> could go beyond simply destroying conflicting networks, and also restart any
> network that had autoaddr='yes'; to make this fully functional we would need
> to finally put in the proper stuff so that tap devices (and the underlying
> emulated NICs) would be set offline when their connected network was
> destroyed, and then reconnected/set online when the network was re-started.
> Getting the networks to behave this way would be useful in general anyway,
> even without thinking about the conflicting-networks problem. The one
> downside of externally controlling renumbering-on-conflict using an external
> script is that it would only work with NetworkManager...

Yeah, I'm trying to remember now why we went the NM hook route, rather
than listening for netlink events. I guess NM is much simpler to hook
into.  I'd honestly not thought about this too much though - just having
an automatically numbered network will already be a huge step forward
compared to current day.

In particular if we insituted a rule that if we are NOT on a hypervisor,
we count from N=254 -> 0, when picking 192.168.N.0, and count from
N=0 -> 254 when we are on a hypervisor, then we'll trivially avoid the
host/guest clash in simple case, even if network is not yet online.

Don't anyone dare mention nested virt with 3 levels of libvirt... 

Seriously though, even without automatic teardown & restart, we'd
be way better off by simply not hardcoding 192.168.N.0 at RPM
install time when the network env is not the same as the run time
network env. eg cloud images

> > Ideally we would have to bring the guest network links offline and
> > then online again to force DHCP re-assignment immediately.
> 
> Yeah, I think it really makes sense that when a libvirt network is
> destroyed, all the tap devices are set offline, and the emulated NICs are
> set offline as well; then when a libvirt network is started, we would go
> through all devices that are supposed to be connected to that network,
> reconnect the taps, set them online, and set the emulated NIC online. We
> currently do the reconnection part when libvirtd is restarted but can't do
> it immediately when a *network* is restarted because the network driver has
> no access to the list of active guests and their interfaces....
> 
> Hmm, we do now maintain the list of ports for each network though, and it
> would be possible to expand that to keep the name of the tap device
> associated with the port in addition to the other info (e.g. whether or not
> the NIC has been set offline via an API call), *but* when a network is
> destroyed, all ports registered with that network are also destroyed, so
> just expanding the attributes for the ports isn't going to get us where we
> need. So, do we want to 1) change it to maintain active ports for a network
> when it is destroyed so that they can be easily reactivated when the network
> is restarted? Or do we want to 2) change the network driver to make calls to
> all registered hypervisor drivers during a net-start to look for all guest
> interfaces that think they are connected to the network? The former sounds
> much more efficient, but I don't know how "dirty" it seems to maintain state
> for something that has been "destroyed"...
> 
> Or maybe we instead need to also add a new API for networks
> virNetworkReconnect(), which will use newly expanded info in the network
> ports list to reconnect all guest interfaces.

Responsibility for enslaving a TAP device into a bridge still lives with
the virt drivers, not the network driver.

The virt drivers could listen for lifecycle events from the network driver
and auto-reconnect.

Alternatively the virt driver could listen for netlink events and see the
virbr0 being deleted, and created by the kernel.

> On a different sub-topic - it would be nice to provide some stability to the
> subnet used for an autoaddr='yes' network (think of the case where every
> time a host is booted, libvirt starts its default network when
> 192.168.122.0/24 is available, but then a short time later a host interface
> is always started on the same subnet - that would mean every time the host
> booted the exact same destabilizing dance would take place even though it
> would be pretty easy to predict the eventually-used subnet based on past
> experience).
> 
> Although we historically have avoided automatic changes to libvirt config
> files by libvirtd itself as much as possible (the only cases I can think of
> are when we're modifying the config to take care of some compatibility
> problem after an upgrade), what do you think about having the autoaddr='yes'
> networks automatically update the config with the current subnet info?
> (maybe this would need to only be done if not starting from a live image or
> something, or maybe it should just always be done). This would then be used
> as the first guess the next time the network was started. That way we would
> avoid the need to delay starting libvirt networks until after host
> networking was fully up; the subnet might bounce around a bit that first
> time, but once a stable address was found during that first run, it would
> then be used from the get-go during all subsequent boots (until/unless
> something changed and it had to be changed yet again).

We could stash the previously chosen  subnet in /var/cache/libvirt/network
or /var/lib/libvirt/network, no need to modify the inactive XML config.
This is like how dnsmasq "remembers" DHCP leases previously given for guests.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|