RFC: disconnecting guest/domain interface config from host config (aka migration with macvtap)

Laine Stump <laine@xxxxxxxxx> · Tue, 12 Apr 2011 12:13:56 -0400

Abstraction of guest <--> host network connection in libvirt
=====================================

The <interface> element of a guest's domain config in libvirt has a 
<source> element that describes what resources on a host will be used to 
connect the guest's network interface to the rest of the world. This is 
very flexible, allowing several different types of connection (virtual 
network, host bridge, direct macvtap connection to physical interface, 
qemu usermode, user-defined via an external script), but currently has 
the problem that unnecessary details of the host config are embedded 
into the guest's config; if the guest is migrated to a different host, 
and that host has a different hardware or network config (or possibly 
the same hardware, but that hardware is currently in use by a different 
guest), the migration will fail.

I am proposing a change to libvirt's network XML that will allow us to 
(optionally - old configs will remain valid) remove the host details 
from the guest's domain XML (which can move around from host to host) 
and place them in the network XML (which remains with the host); the 
domain XML will then use existing config elements to associate each 
guest interface with a "network".

The motivating use case for this change is the "direct" connection type 
(which uses macvtap for vepa and vnlink connections directly between a 
guest and a physical interface, rather than through a bridge), but it is 
applicable for all types of connection. (Another hopeful side effect of 
this change will be to make libvirt's network connection model easier to 
realize on non-Linux hypervisors (eg, VMWare ESX), so Mathias - please 
chime in!)

Background
--------------------

libvirt currently has 3 major types of guest interface connection (there 
are also "type='user'" and "type='ethernet'", but they probably wouldn't 
be used in a multi-host environment, so I'm not considering them here):

1) type='network'

The guest's network interface is connected to a libvirt-created "virtual 
network", which is in reality (in the case of KVM or Xen) a Linux bridge 
device that isn't connected to any physical host interface - any 
connection to the outside goes through the host's IP routing stack.

The network to use is indicated in the <source> element of the guest's 
interface xml: <source network='mynetwork'/>. Because the name 
'mynetwork' is controlled by libvirt, it's perfectly reasonable to 
assume that the same network name could be available on another host 
that is accepting a migrated guest.

2) type='bridge'

The guest's network interface is connected to a bridge device (eg "br0") 
that has already been configured in the host's network config files (eg, 
in /etc/sysconfig/network-scripts). This bridge is itself connected to 
the outside via a physical host interface, eg "eth0", *NOT* through the 
hosts IP routing stack.

The bridge to use is indicated in the source with <source 
bridge='br0'/>. Although the naming of the bridge is outside the scope 
of libvirt, it is at least possible to setup all hosts to have the same 
bridge name (so that a guest could be migrated from one host to another).

3) type='direct'

The guest's network interface is connected directly to a physical 
interface (eg "eth0") with macvtap, or sometimes to a virtual function 
("VF") of a physical interface (which is also really just another 
interface, from the software point of view).

The interface to use is indicated with <source interface='eth0' 
mode='something'/> In this case, the interface name is determined by the 
host OS and cannot be arbitrarily changed. Also a host will have 
multiple interfaces / VFs available to guests, and in some modes may 
allow only a single guest to connect to a given interface (implying that 
the interface used by a guest when on one host will probably not be 
available when migrating to another). So in order to have flexible 
migration from one host to another, an abstraction to allow the guest 
XML to use the same name on all hosts must be introduced.

Three possible methods for providing this abstraction come to mind:

Option 1
-----------

(Be forewarned that Option 1 & 2 are shown here mainly to illustrate my 
thought process while arriving at my preferred Option - 3 :-)

In a manner similar to the way the vnet%d tap devices are created, name 
the interface with an embedded variable (eg "eth%d") (plus attributes 
for min and max %d) and let the underlying code in libvirt search 
for/reserve an appropriate device>

This is the simplest to code/configure, but does not allow a) more 
complex names (eg, interface names as determined by biosdevname can be 
of the form "pci%dp%d_%d"), b) multiple ranges, c) oversubscribing of 
interfaces (it is possible, although sub-optimal, to connect multiple 
guest interfaces to a single host interface with macvtap).

VERDICT: looks ugly, not flexible enough.

Option 2
-----------

create a new class of libvirt XML config to describe a pool of network 
interfaces, and reference this pool in the guest interface element:

<interface type='interfacePool'>
<source pool='red-network'/>
         ...
</interface>

The problem with this is that it requires a new API for 
defining/undefining/etc management of "interface pools". Also, it 
wouldn't allow (for example) one host to use a pool of macvtap addresses 
to connect guests, and another host to use a host bridge for the same 
connection (obviously, such a non-uniform setup wouldn't be desirable in 
a large host farm, but may be encountered in some smaller setup)

VERDICT: creates more API clutter (ie extra work *and* confusion for 
users). Is "flexible enough" for current motivation, but unnecessarily 
limiting, eg doesn't help the model to be more easily adapted to VMWare etc.

Option 3
-----------

Up to now we've only discussed the need for separating the host-specific 
config (<source> element) in the case of type='direct' interfaces (well, 
in reality I've gone back and edited this document so many times that is 
no longer true, but play along with me! :-). But it really is a problem 
for all interface types - all of the information currently in the 
guest's interface <source> element really is tied to the host, and 
shouldn't be defined in detail in the guest XML; it should instead be 
defined once for each host, and only referenced by some name in the 
guest XML; that way as a guest moves from host to host, it will 
automatically adjust its connection to match the new environmant.

As a more general solution, instead of having the special new 
"interfacePool" object in the config, what if the XML for "network was 
expanded to mean "any type of guest network connection" (with a new 
"type='xxx'" attribute at the toplevel to indicate which type), not just 
"a private bridge optionally connected to the real world via routing/NAT"?

If this was the case, the guest interface XML could always be, eg:

<interface type='network'>
<source network='red-network'/>
          ...
</interface>

and depending on the network config of the host the guest was migrated 
to, this could be either a direct (macvtap) connection via an interface 
allocated from a pool (the pool being defined in the definition of 
'red-network'), a bridge (again, pointed to by the definition of 
'red-network', or a virtual network (using the current network 
definition syntax). This way the same guest could be migrated not only 
between macvtap-enabled hosts, but from there to a host using a bridge, 
or maybe a host in a remote location that used a virtual network with a 
secure tunnel to connect back to the rest of the red-network. (Part of 
the migration process would of course check that the destination host 
had a network of the proper name, and fail if it didn't; management 
software at a level above libvirt would probably filter a list of 
candidate migration destinations based on available networks, and only 
attempt migration to one that had the matching network available).

Examples of 'red-network' for different types of connections (all of 
these would work with the interface XML given above):

<!-- Existing usage - a libvirt virtual network -->
<network> <!-- (you could put "type='virtual'" here for symmetry) -->
<name>red-network</name>
<bridge name='virbr0'/>
<forward mode='route'/>
      ...
</network>

<!-- The simplest - an existing host bridge -->
<network type='bridge'>
<name>red-network</name>
<bridge name='br0'/>
</network>

<network type='direct'>
<name>red-network</name>
<source mode='vepa'>


</source>



</network>

I know there may be some resistance to this expansion of the usage of 
<network>, but I think it does fit in with the current usage properly, 
and is preferable to adding an entire new class of API just to define a 
pool of interfaces.

Open questions:

1) What should the <pool> element inside network/source look like. 
Making each interface in the pool a separate element, with possible 
attributed, would be the simplest to code, but would get tedious on a 
system with, for example, an ethernet card with 64 VFs. On the other 
hand, just parameterizing a string (eth%d) is inadequate, eg, when there 
are multiple non-contiguous ranges.

2) do we need a "max connections" for each interface in a pool of 
macvtap interfaces? Or should we just overload them in a round-robin 
fashion unless mode='passthru' (a new mode which requires only one guest 
per interface).

3) What about the parameters in the <virtualport> element that are 
currently used by vepa/vnlink. Do those belong with the host, or with 
the guest?

4) Are there other <network> types that we want? Perhaps the recent 
proposal for IPSec / secure tunnels could be incorporated as a new 
network type (or maybe it could just be the standard "virtual" type, 
with a tunnel as the forward device).

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list