Re: Virtual networking (not the rathole thread :-)

Mark McLoughlin <markmc@xxxxxxxxxx> · Wed, 17 Jan 2007 18:38:25 +0000

On Tue, 2007-01-16 at 22:28 +0000, Daniel P. Berrange wrote:
> On Mon, Jan 15, 2007 at 08:06:18PM +0000, Mark McLoughlin wrote:

> Since we've disappeared down a rat-hole with the other part of the thread,
> here's an attempt to get back on-topic :-)

	Indeed :-)

> Since the user is privileged, another way to do without VDE is to mirror
> the Xen case almost exactly, creating one tap device per guest, instead
> of Xen's netback vif devices:

	Sure. There is the argument that always using VDE is nicer because it's
consistent with the non-privileged and remotely connected network
versions.

	As you say, though, this way is consistent with the Xen version.

> >      3. An unprivileged user does exactly the same thing as (2).
> >         
> >           +-----------+                               +-----------+
> >           |   Guest   |          +----+----+          |   Guest   |
> >           |     A     |          |userspace|          |     B     |
> >           |   +---+   |          | network |          |   +---+   |
> >           |   |NIC|   |          |  stack  |          |   |NIC|   |
> >           +---+-+-+---+          +----+----+          +---+-+-+---+
> >                 ^       +-------+     |     +-------+       ^
> >                 |       |       | +---+---+ |       |       |
> >                 +------>+ VLAN0 +-+  VDE  +-+ VLAN0 +<------+
> >                         |       | +-------+ |       |
> >                         +-------+           +-------+
> >         
> >         Notes:
> >         
> >               * Similar to (2) except there is can be no TAP device or
> >                 bridge 
> >               * The userspace network stack is implemented using
> >                 slirpvde to provide a DHCP server and DNS proxy to the
> >                 network, but also effectively a SNAT and DNAT router. 
> >               * slirpvde implements ethernet, ip, tcp, udp, icmp, dhcp,
> >                 tftp (etc.) in userspace. Completely crazy, but since
> >                 the kernel apparently has no secure way to allow
> >                 unprivileged users to leverage the kernel's network
> >                 stack for this, then it must be done in userspace. 
> 
> Is it practical to just have some kind of privileged proxy that would
> merely create & configure the tap devices on behalf of the unprivileged
> guests ? If we just create tap devices for any unprivileged guest, but
> kept them discounted from any real network device, would that still be
> a big hole ?

	Okay, to avoid a userspace network stack, you need a way to securely
allow guests running as unprivileged users to use the kernel's network
stack. That implies:

  1) The packets/frames have to arrive on a network interface created 
     by the user (e.g. a TAP or SLIP iface)

  2) It should not be possible to spoof as another host or adversely 
     affect the host's connectivity, or any other machine on the same 
     network as the host

  3) slirp prevents spoofing by effectively translating the source
     address of any packet which leaves the virtual network, just like a
     router using SNAT

  4) We can do the same thing by enabling IP forwarding and having all 
     packets forwarded by the host go through SNAT

  5) The problem with that is what to do about packets not being 
     forwarded by the host, but which are destined for the host itself? 
     SNAT in PREROUTING might do it, but that's not allowed it seems.

  6) We also have to worry about whether people could e.g. screw up the 
     host's ARP cache

  7) We also have to worry about a DOS whereby someone creates lots of 
     network interfaces

	And note, this isn't just about worrying about nasty guests. You have
to worry about what nasty users on the host could do with a setuid
helper like this.

	It's certainly got to be "possible" ... but I don't yet feel I know
what all the bases are that need to be covered, never mind how we'd
cover them.

> Or can we leverage QEMU's builtin SLIRP or other non-TAP networking modes
> to construct something reasonable in userspace, without using VDE.

	The general problem with any SLIRP derivative or similar it's another
network stack implementation. That makes me nervous for security,
performance, stability and portability reasons.

	And as I found out, the case in point is that SLIRP currently has
buffer overflow vulnerabilities and isn't 64 bit clean.

> > Virtual Networks will be implemented in libvirt. First, there will be an
> > XML description of Virtual Networks e.g.:
> > 
> >   <network id="0">
> >     <name>Foo</name>
> >     <uuid>596a5d2171f48fb2e068e2386a5c413e</uuid>
> >     <listen address="172.31.0.5" port="1234" />
> >     <connections>
> >       <connection address="172.31.0.6" port="4321" />
> >     </conections>
> >     <dhcp enabled="true">
> >       <ip address="10.0.0.1" 
> >           netmask="255.255.255.0" 
> >           start="10.0.0.128"
> >           end="10.0.0.254" />
> >     </dhcp>
> >     <forwarding enabled="true">
> >       <incoming default="deny">
> >         <allow port="123" domain="foobar" destport="321" />
> >       </incoming>
> >       <outgoing default="allow">
> >         <deny port="25" />
> >       </outgoing>
> >     </forwarding>
> >   <network>
> 
> Got to also think how we connect guest domains to the virtual network.

	Right, further on in the mail I said:

      * Where is the connection between domains and networks in either
        the API or the XML format? How is a domain associated with a
        network? You put a bridge name in the <network> definition
        and use that in the domains <interface> definition? Or you put
        the network name in the interface definition and have libvirt
        look up the bridge name when creating the guest? 

> Currently we just have something really simple like
> 
>   <interface type="bridge">
>     <source bridge='xenbr0'/>
>     <mac address='00:11:22:33:44:55'/>
>   </interface>
> 
> I guess we've probably want to refer to the UUID of the network to map
> it into the guest.

	Well, the UUID isn't much good if you can't map it. So, it would
probably be the name and libvirt URI, right?

> Oh, do we to define a 'network 0' to the the physical network of the hos
> machine - what if there are multiple host NICs - any conventions we
> need to let us distinguish ?  Maybe its best to just refer to the host
> network by using IP addresses - so we can deal better which case where
> a machine switches from eth0 -> eth1 (wired to wireless) but keeps the
> same  IP address, or some such.

	Well, I think there should be a default virtual network defined
somehow. You shouldn't need to create one unless you want a second one.

	But remember that under the model I'm suggesting, guests connect
*either* to a virtual network or a physical network via a "shared
physical interface".

	The shared physical interface just winds up being a bridge you enslave
the guest's interface to, so the easiest answer for that is that we
stick with the way it is right now for Xen and have QEMU create a TAP
device and enslave that to the bridge in this mode.

	Dunno, it does need more thought/discussion ... I find the current
<interface> stuff quite strange now - e.g. "bridge" vs. "ethernet" types
and the bridge name is in <source> ?

Cheers,
Mark.