On Mon, Jan 15, 2007 at 08:06:18PM +0000, Mark McLoughlin wrote: > Hi, > Dan and I have been discussing how to "fix networking", not just Xen's > networking but also getting something sane wrt. QEMU/KVM etc. > > Comments very welcome on the writeup below. The libvirt stuff is > towards the end, but I think all of it is probably useful to this list. Since we've disappeared down a rat-hole with the other part of the thread, here's an attempt to get back on-topic :-) > 1. A privileged user creates two (Xen) guests, each with a Virtual > Network Interface. Without any special networking configuration, > these two guests are connected to a default Virtual Network > which contains a combined Virtual Bridge/Router/Firewall. > > +-----------+ D +-----------+ > | Guest | N D H | Guest | > | A | A N C | B | > | +---+ | T S P | +---+ | > | |NIC| | ^ ^ ^ | |NIC| | > +---+-+-+---+ +---+---+ +---+-+-+---+ > ^ | ^ > | +--------+ +---+---+ +--------+ | > +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+ > +--------+ +-------+ +--------+ > > Notes: > > * "vnbr0" is a bridge device with it's own IP address on > the same subnet as the guests. > * IP forwarding is enabled in Dom0. Masquerading and DNAT > is implemented using iptables. > * We run a DHCP server and a DNS proxy in Dom0 (e.g. > dnsmasq) > 2. A privileged user does exactly the same thing as (1), but with > QEMU guests. > > D > N D H > A N C > T S P > ^ ^ ^ > +---+---+ > | > +---+---+ > +-----------+ | vnbr0 | +-----------+ > | Guest | +---+---+ | Guest | > | A | | | B | > | +---+ | +---+---+ | +---+ | > | |NIC| | | vtap0 | | |NIC| | > +---+-+-+---+ +---+---+ +---+-+-+---+ > ^ +-------+ | +-------+ ^ > | | | +---+---+ | | | > +------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+ > | | +-------+ | | > +-------+ +-------+ > > Notes: > > * VDE is a userspace ethernet bridge implemented using > vde_switch > * "vtap0" is a TAP device created by vde_switch > * Everything else is the same as (1) > * This could be done without vde_switch by having Guest A > create vtap0 and have Guest B connect directly to Guest > A's VLAN. However, if Guest A is shut down, Guest B's > network would go down. Since the user is privileged, another way to do without VDE is to mirror the Xen case almost exactly, creating one tap device per guest, instead of Xen's netback vif devices: +-----------+ D +-----------+ | Guest | N D H | Guest | | A | A N C | B | | +---+ | T S P | +---+ | | |NIC| | ^ ^ ^ | |NIC| | +---+-+-+---+ +---+---+ +---+-+-+---+ ^ | ^ | +--------+ +---+---+ +--------+ | +-->+ vtap0 +----+ vnbr0 +----+ vtap1 +<--+ +--------+ +-------+ +--------+ > 3. An unprivileged user does exactly the same thing as (2). > > +-----------+ +-----------+ > | Guest | +----+----+ | Guest | > | A | |userspace| | B | > | +---+ | | network | | +---+ | > | |NIC| | | stack | | |NIC| | > +---+-+-+---+ +----+----+ +---+-+-+---+ > ^ +-------+ | +-------+ ^ > | | | +---+---+ | | | > +------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+ > | | +-------+ | | > +-------+ +-------+ > > Notes: > > * Similar to (2) except there is can be no TAP device or > bridge > * The userspace network stack is implemented using > slirpvde to provide a DHCP server and DNS proxy to the > network, but also effectively a SNAT and DNAT router. > * slirpvde implements ethernet, ip, tcp, udp, icmp, dhcp, > tftp (etc.) in userspace. Completely crazy, but since > the kernel apparently has no secure way to allow > unprivileged users to leverage the kernel's network > stack for this, then it must be done in userspace. Is it practical to just have some kind of privileged proxy that would merely create & configure the tap devices on behalf of the unprivileged guests ? If we just create tap devices for any unprivileged guest, but kept them discounted from any real network device, would that still be a big hole ? Or can we leverage QEMU's builtin SLIRP or other non-TAP networking modes to construct something reasonable in userspace, without using VDE. > 4. Same as (2), except the user also creates two Xen guests. > > +-----------+ D +-----------+ > | Guest | N D H | Guest | > | A | A N C | B | > | +---+ | T S P | +---+ | > | |NIC| | ^ ^ ^ | |NIC| | > +---+-+-+---+ +---+---+ +---+-+-+---+ > ^ | ^ > | +--------+ +---+---+ +--------+ | > +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+ > +--------+ +---+---+ +--------+ > | > +---+---+ > | vtap0 | > +---+---+ > | > +-------+ +--+--+ +-------+ > +---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+ > | +-------+ +-----+ +-------+ | > V V > +---+-+-+---+ +---+-+-+---+ > | |NIC| | | |NIC| | > | +---+ | | +---+ | > | Guest | | Guest | > | C | | D | > +-----------+ +-----------+ > > Notes: > > * In this case we could do away with VDE and have each > QEMU guest use its own TAP device. Yep, that would make sense if the guests were privileged - best to stay close to kernel networking devices if at all possible. > 5. Same as (3) except Guests A and C are connected to a Shared > Physical Interface. > > +-----------+ | D +-----------+ > | Guest | ^ | N D H | Guest | > | A | | | A N C | B | > | +---+ | +---+---+ | T S P | +---+ | > | |NIC| | | eth0 | | ^ ^ ^ | |NIC| | > +---+-+-+---+ +---+---+ | +---+---+ +---+-+-+---+ > ^ | | | ^ > | +--------+ +---+---+ | +---+---+ +--------+ | > +>+ vif1.0 +-+ ebr0 + | + vnbr0 +-+ vif2.0 +<-+ > +--------+ +---+---+ | +---+---+ +--------+ > | | | > +---+---+ | +---+---+ > | vtap1 | | | vtap0 | > +---+---+ | +---+---+ > | | | > +-------+ +--+--+ | +--+--+ +-------+ > +->+ VLAN0 +--+ VDE + | + VDE +--+ VLAN0 +<-+ > | +-------+ +-----+ | +-----+ +-------+ | > V | V > +---+-+-+---+ | +---+-+-+---+ > | |NIC| | | | |NIC| | > | +---+ | | | +---+ | > | Guest | | | Guest | > | C | | | D | > +-----------+ | +-----------+ > > Notes: > > * The idea here is that when the admin configures eth0 to > be shareable, eth0 is configured as an addressless NIC > enslaved to a bridge which has the MAC address and IP > address that eth0 should have > * Again, VDE is redundant here. This diagram just scares me, but I guess its merely showing two isolated networks with a different set of guests on each. Probably be much less scary if not ascii-art.. > 6. Same as 2) except the QEMU guests are on a Virtual Network on > another physical machine which is, in turn, connected to the > Virtual Network on the first physical machine > > +-----------+ D +-----------+ > | Guest | N D H | Guest | > | A | A N C | B | > | +---+ | T S P | +---+ | > | |NIC| | ^ ^ ^ | |NIC| | > +---+-+-+---+ +---+---+ +---+-+-+---+ > ^ | ^ > | +--------+ +---+---+ +--------+ | > +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+ > +--------+ +---+---+ +--------+ > | > +---+---+ > | vtap0 | > +---+---+ > | > +--+--+ > | VDE | > +--+--+ > | > First Physical Machine V > ------------------------------------------------------------- > Second Physical Machine ^ > | > +-------+ +--+--+ +-------+ > +---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+ > | +-------+ +-----+ +-------+ | > V V > +---+-+-+---+ +---+-+-+---+ > | |NIC| | | |NIC| | > | +---+ | | +---+ | > | Guest | | Guest | > | C | | D | > +-----------+ +-----------+ > > Notes: > > * What's going on here is that the two VDEs are connected > over the network, either via a plan socket or perhaps > encapsulated in another protocol like SSH or TLS This is the case where I always thought VDE did get interesting - being able to create pure userspace virtual networks across machines, without any root privileges. Gives joe-user a nice lot of power > Virtual Networks will be implemented in libvirt. First, there will be an > XML description of Virtual Networks e.g.: > > <network id="0"> > <name>Foo</name> > <uuid>596a5d2171f48fb2e068e2386a5c413e</uuid> > <listen address="172.31.0.5" port="1234" /> > <connections> > <connection address="172.31.0.6" port="4321" /> > </conections> > <dhcp enabled="true"> > <ip address="10.0.0.1" > netmask="255.255.255.0" > start="10.0.0.128" > end="10.0.0.254" /> > </dhcp> > <forwarding enabled="true"> > <incoming default="deny"> > <allow port="123" domain="foobar" destport="321" /> > </incoming> > <outgoing default="allow"> > <deny port="25" /> > </outgoing> > </forwarding> > <network> Got to also think how we connect guest domains to the virtual network. Currently we just have something really simple like <interface type="bridge"> <source bridge='xenbr0'/> <mac address='00:11:22:33:44:55'/> </interface> I guess we've probably want to refer to the UUID of the network to map it into the guest. Oh, do we to define a 'network 0' to the the physical network of the hos machine - what if there are multiple host NICs - any conventions we need to let us distinguish ? Maybe its best to just refer to the host network by using IP addresses - so we can deal better which case where a machine switches from eth0 -> eth1 (wired to wireless) but keeps the same IP address, or some such. > * The XML format isn't thought out at all, but briefly: > * The <listen> and <connections> elements describe > networks connected across physical machine boundaries. > * The <dhcp> element describes the configuration of the > DHCP server on the network. > * The <forwarding> element describes how incoming and > outgoing connections are forwarded. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|