Hi, Dan and I have been discussing how to "fix networking", not just Xen's networking but also getting something sane wrt. QEMU/KVM etc. Comments very welcome on the writeup below. The libvirt stuff is towards the end, but I think all of it is probably useful to this list. Cheers, Mark. Virtual Networking The ability to manage virtual machines is something which is receiving a lot of focus right now. Xen, KVM, QEMU and others provide the infrastructure required to run a virtual machine, and each can provide guests with a virtual network interface. This proposal addresses the problem of how guests are networked together. We aim: * To make virtual networking "just work". Guests should be able to communicate with each other, their host and the Internet without any fuss or configuration. This should be the case even with laptops and offline machines. * To allow a greater flexibily with how guests are networked. It should be possible to isolate groups of guests in different networks, allow guests on different physical machines to communicate, firewall guests' networks from physical networks or allow guests to appear just like physical machines on physical networks. * To make networking virtual machines analogous with networking physical machines. * To support inter-networking between virtualisation technologies. User Visible Concepts ===================== It's important to consider the manner in which we expose the functionality of virtual networking. What concepts will be exposing through the UI? Are those concepts well defined and consistent? Are those concepts more complex than neccessary? Or are the too simple to be able to support the functionality we want? Real world, or "physical", concepts[1]: * Network - a number of interconnected machines. * Network Interface - hardware which enables a machine to connect to a network. * Bridge - hardware which allows enables the interconnection of machines to form a network. Bridges can also be connected to other bridges to form a larger network. * Router - hardware which connects two or more distinct networks, allowing machines on different networks to communicate with one another. Sometimes a router and a bridge are available as a combined piece of hardware - the bridge forms a network and the router connects that network to another distinct network. * Firewall - software on a router which can be used to control how machines on an "external" network (e.g. the Internet) can communicate with machines on an "internal" network. For a given type of connection, you can choose to disallow connections of a that type or forward them to a specific internal machine. Can also be used to control how internal machines can communicate with external machines. With virtual networking, we will be exposing the following "virtual" concepts: * Virtual Network - a number of interconnected virtual machines. * Virtual Network Interface - a network interface in a virtual machine. * Virtual Bridge - allows the interconnection of virtual machines to form a virtual network. A virtual bridge may be configured to also act as a virtual router and firewall. A virtual bridge may also be connected to another virtual bridge (perhaps on another physical machine) to create a larger virtual network. (Note, unprivileged users may create any of the above) Finally, where the physical world meets the virtual world: * Shared Physical Interface - if a physical interface is configured to be "shared", then any number of virtual interfaces may be connected to it allowing virtual machines to be connected to the same physical network which the physical interface is connected to. Only privileged users may configure a physical interface to be shared and/or connect guests to it. There are a few problems with all of the above: 1. The distinction between a bridge and a router requires a lot of technical knowledge to fully understand. However, the model of e.g. a LinkSys router is familiar to a lot of people - a box which allows you to network your machines together and connect that network to (and firewall off) the Internet. 2. This "shared physical interface" notion is very "makey upey". We could perhaps talk about the idea in terms of connecting a physical interface to a virtual bridge, but it exposes the bridge vs. router distinction more than we'd like. 3. Guests are connected to a specific physical interface, whereas perhaps users wish guests to be connected to "the network" - i.e. if NetworkManager switched from wireless to wired while remaining on the same subnet, perhaps we'd like to automatically switch the bridge to the new network. In reality, though, bridged networking is only really sane for machines on a fairly static network connection. [1] - Yes, these definitions aren't entirely accurate, but they describe the kind of understanding a moderately technical user might have of the concepts. Example Networks ================ Below are some example networks users may configure and an explanation of how that network would be implemented in practice. 1. A privileged user creates two (Xen) guests, each with a Virtual Network Interface. Without any special networking configuration, these two guests are connected to a default Virtual Network which contains a combined Virtual Bridge/Router/Firewall. +-----------+ D +-----------+ | Guest | N D H | Guest | | A | A N C | B | | +---+ | T S P | +---+ | | |NIC| | ^ ^ ^ | |NIC| | +---+-+-+---+ +---+---+ +---+-+-+---+ ^ | ^ | +--------+ +---+---+ +--------+ | +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+ +--------+ +-------+ +--------+ Notes: * "vnbr0" is a bridge device with it's own IP address on the same subnet as the guests. * IP forwarding is enabled in Dom0. Masquerading and DNAT is implemented using iptables. * We run a DHCP server and a DNS proxy in Dom0 (e.g. dnsmasq) 2. A privileged user does exactly the same thing as (1), but with QEMU guests. D N D H A N C T S P ^ ^ ^ +---+---+ | +---+---+ +-----------+ | vnbr0 | +-----------+ | Guest | +---+---+ | Guest | | A | | | B | | +---+ | +---+---+ | +---+ | | |NIC| | | vtap0 | | |NIC| | +---+-+-+---+ +---+---+ +---+-+-+---+ ^ +-------+ | +-------+ ^ | | | +---+---+ | | | +------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+ | | +-------+ | | +-------+ +-------+ Notes: * VDE is a userspace ethernet bridge implemented using vde_switch * "vtap0" is a TAP device created by vde_switch * Everything else is the same as (1) * This could be done without vde_switch by having Guest A create vtap0 and have Guest B connect directly to Guest A's VLAN. However, if Guest A is shut down, Guest B's network would go down. 3. An unprivileged user does exactly the same thing as (2). +-----------+ +-----------+ | Guest | +----+----+ | Guest | | A | |userspace| | B | | +---+ | | network | | +---+ | | |NIC| | | stack | | |NIC| | +---+-+-+---+ +----+----+ +---+-+-+---+ ^ +-------+ | +-------+ ^ | | | +---+---+ | | | +------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+ | | +-------+ | | +-------+ +-------+ Notes: * Similar to (2) except there is can be no TAP device or bridge * The userspace network stack is implemented using slirpvde to provide a DHCP server and DNS proxy to the network, but also effectively a SNAT and DNAT router. * slirpvde implements ethernet, ip, tcp, udp, icmp, dhcp, tftp (etc.) in userspace. Completely crazy, but since the kernel apparently has no secure way to allow unprivileged users to leverage the kernel's network stack for this, then it must be done in userspace. 4. Same as (2), except the user also creates two Xen guests. +-----------+ D +-----------+ | Guest | N D H | Guest | | A | A N C | B | | +---+ | T S P | +---+ | | |NIC| | ^ ^ ^ | |NIC| | +---+-+-+---+ +---+---+ +---+-+-+---+ ^ | ^ | +--------+ +---+---+ +--------+ | +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+ +--------+ +---+---+ +--------+ | +---+---+ | vtap0 | +---+---+ | +-------+ +--+--+ +-------+ +---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+ | +-------+ +-----+ +-------+ | V V +---+-+-+---+ +---+-+-+---+ | |NIC| | | |NIC| | | +---+ | | +---+ | | Guest | | Guest | | C | | D | +-----------+ +-----------+ Notes: * In this case we could do away with VDE and have each QEMU guest use its own TAP device. 5. Same as (3) except Guests A and C are connected to a Shared Physical Interface. +-----------+ | D +-----------+ | Guest | ^ | N D H | Guest | | A | | | A N C | B | | +---+ | +---+---+ | T S P | +---+ | | |NIC| | | eth0 | | ^ ^ ^ | |NIC| | +---+-+-+---+ +---+---+ | +---+---+ +---+-+-+---+ ^ | | | ^ | +--------+ +---+---+ | +---+---+ +--------+ | +>+ vif1.0 +-+ ebr0 + | + vnbr0 +-+ vif2.0 +<-+ +--------+ +---+---+ | +---+---+ +--------+ | | | +---+---+ | +---+---+ | vtap1 | | | vtap0 | +---+---+ | +---+---+ | | | +-------+ +--+--+ | +--+--+ +-------+ +->+ VLAN0 +--+ VDE + | + VDE +--+ VLAN0 +<-+ | +-------+ +-----+ | +-----+ +-------+ | V | V +---+-+-+---+ | +---+-+-+---+ | |NIC| | | | |NIC| | | +---+ | | | +---+ | | Guest | | | Guest | | C | | | D | +-----------+ | +-----------+ Notes: * The idea here is that when the admin configures eth0 to be shareable, eth0 is configured as an addressless NIC enslaved to a bridge which has the MAC address and IP address that eth0 should have * Again, VDE is redundant here. 6. Same as 2) except the QEMU guests are on a Virtual Network on another physical machine which is, in turn, connected to the Virtual Network on the first physical machine +-----------+ D +-----------+ | Guest | N D H | Guest | | A | A N C | B | | +---+ | T S P | +---+ | | |NIC| | ^ ^ ^ | |NIC| | +---+-+-+---+ +---+---+ +---+-+-+---+ ^ | ^ | +--------+ +---+---+ +--------+ | +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+ +--------+ +---+---+ +--------+ | +---+---+ | vtap0 | +---+---+ | +--+--+ | VDE | +--+--+ | First Physical Machine V ------------------------------------------------------------- Second Physical Machine ^ | +-------+ +--+--+ +-------+ +---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+ | +-------+ +-----+ +-------+ | V V +---+-+-+---+ +---+-+-+---+ | |NIC| | | |NIC| | | +---+ | | +---+ | | Guest | | Guest | | C | | D | +-----------+ +-----------+ Notes: * What's going on here is that the two VDEs are connected over the network, either via a plan socket or perhaps encapsulated in another protocol like SSH or TLS One interesting thing to note from all of those examples is that although QEMU's networking options are very interesting, it doesn't actually make sense for a network to be implemented inside a guest. The network needs to be external to any guests, and so we use VDE to offer similar networking options to the ones QEMU provides. All QEMU needs to be able to do is to connect to VDE. User Interface ============== This isn't meant a UI specification, but just some notes on how this stuff might be exposed in virt-manager. * Networks List: * Name * Virtual/Physical * Status * Activity/traffic * Virtual Network Configuration: * Name * List of connected guests * Allow other Virtual Networks to connect to this (defaults to no) * Connect to other Virtual Network (defaults to none) * DHCP enabled - DHCP configuration: * IP range (optional) * Router IP address (optional) * Guest IP address/hostname assignment (optional) * Forwarding enabled - firewall configuration: * Incoming ports list and destination guest+port for each (defaults to empty) * Blocked outgoing ports lists (defaults to empty) * Virtual NICs list: * Guest interface name * Virtual Network/Shared Physical Interface * Hostname (defaults to guest name) * IP address (if assigned) * MAC address (if assigned) * Virtual NIC Configuration: * Random MAC address, or user-supplied MAC address. * Virtual Network or Shared Physical Interface to connect to. Implementation ============== Parity with the current state of networking with Xen will be achieved by: * Implementing "shared physical interface" support in Fedora's initscripts and network configuration tool. It boils down to configuring the interface (e.g. eth0) something like: ifcfg-peth0: DEVICE=peth0 ONBOOT=yes Bridge=eth0 HWADDR=00:30:48:30:73:19 ifcfg-eth0 DEVICE=eth0 Type=Bridge ONBOOT=yes BOOTPROTO=dhcp * Fixing Xen so that netloop is no longer required. Upstream have ideas about how to make Xen automatically copy any frames that are destined for Dom0 so that the netback driver doesn't run out of shared pages if Dom0 doesn't process the frames quickly enough. * Create new network/vif scripts for Xen which will connect guests to a shared physical interface's bridge. Virtual Networks will be implemented in libvirt. First, there will be an XML description of Virtual Networks e.g.: <network id="0"> <name>Foo</name> <uuid>596a5d2171f48fb2e068e2386a5c413e</uuid> <listen address="172.31.0.5" port="1234" /> <connections> <connection address="172.31.0.6" port="4321" /> </conections> <dhcp enabled="true"> <ip address="10.0.0.1" netmask="255.255.255.0" start="10.0.0.128" end="10.0.0.254" /> </dhcp> <forwarding enabled="true"> <incoming default="deny"> <allow port="123" domain="foobar" destport="321" /> </incoming> <outgoing default="allow"> <deny port="25" /> </outgoing> </forwarding> <network> In a manner similar to libvirt's QEMU support, there will be a daemon to manage Virtual Networks. The daemon will have access to a store of network definitions. The daemon will be responsible for managing the bridge devices, vde_switch/dhcp/dnses processes and the iptables rules needed for SNAT/DNAT etc. virsh command line interface would look like: $> virsh network-create foo.xml $> virsh network-dumpxml > foo.xml $> virsh network-define foo.xml $> virsh network-list $> virsh network-start Foo $> virsh network-stop Foo $> virsh network-restart Foo The libvirt API for virtual networks would be modelled on the API for virtual machines: /* * Virtual Networks API */ /** * virNetwork: * * a virNetwork is a private structure representing a virtual network. */ typedef struct _virNetwork virNetwork; /** * virNetworkPtr: * * a virNetworkPtr is pointer to a virNetwork private structure, this is the * type used to reference a virtual network in the API. */ typedef virNetwork *virNetworkPtr; /** * virNetworkCreateFlags: * * Flags OR'ed together to provide specific behaviour when creating a * Network. */ typedef enum { VIR_NETWORK_NONE = 0 } virNetworkCreateFlags; /* * List active networks */ int virConnectNumOfNetworks (virConnectPtr conn); int virConnectListNetworks (virConnectPtr conn, int *ids, int maxids); /* * List inactive networks */ int virConnectNumOfDefinedNetworks (virConnectPtr conn); int virConnectListDefinedNetworks (virConnectPtr conn, const char **names, int maxnames); /* * Lookup network by name, id or uuid */ virNetworkPtr virNetworkLookupByName (virConnectPtr conn, const char *name); virNetworkPtr virNetworkLookupByID (virConnectPtr conn, int id); virNetworkPtr virNetworkLookupByUUID (virConnectPtr conn, const unsigned char *uuid); virNetworkPtr virNetworkLookupByUUIDString (virConnectPtr conn, const char *uuid); /* * Create active transient network */ virNetworkPtr virNetworkCreateXML (virConnectPtr conn, const char *xmlDesc, unsigned int flags); /* * Define inactive persistent network */ virNetworkPtr virNetworkDefineXML (virConnectPtr conn, const char *xmlDesc); /* * Delete persistent network */ int virNetworkUndefine (virNetworkPtr network); /* * Activate persistent network */ int virNetworkCreate (virNetworkPtr network); /* * Network destroy/free */ int virNetworkDestroy (virNetworkPtr network); int virNetworkFree (virNetworkPtr network); /* * Network informations */ const char* virNetworkGetName (virNetworkPtr network); unsigned int virNetworkGetID (virNetworkPtr network); int virNetworkGetUUID (virNetworkPtr network, unsigned char *uuid); int virNetworkGetUUIDString (virNetworkPtr network, char *buf); char * virNetworkGetXMLDesc (virNetworkPtr network, int flags); Discussion points on the XML format and API: * The XML format isn't thought out at all, but briefly: * The <listen> and <connections> elements describe networks connected across physical machine boundaries. * The <dhcp> element describes the configuration of the DHCP server on the network. * The <forwarding> element describes how incoming and outgoing connections are forwarded. * Since virConnect is supposed to be a connection to a specific hypervisor, does it make sense to create networks (which should be hypervisor agnostic) through virConnect? * Are we needlessly replicating any mistakes from the domains API here? e.g. is the transient vs. persistent distinction useful for networks? * Is a UUID useful for networks? Yes, because it distinguishes between networks of the same name on different hosts? * Where is the connection between domains and networks in either the API or the XML format? How is a domain associated with a network? You put a bridge name in the <network>l definition and use that in the domains <interface> definition? Or you put the network name in the interface definition and have libvirt look up the bridge name when creating the guest? * Should it be possible to stop/start/restart a network? What for? If something breaks the user restarts it to see if that will fix it?