On Tue, Jan 26, 2010 at 05:22:05PM -0500, Stefan Berger wrote: > "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote on 01/26/2010 04:21:56 > > > > libvir-list, gerhard.stenzel, Vivek Kashyap, arndb > > > > Please respond to "Daniel P. Berrange" > > > > On Mon, Jan 25, 2010 at 12:47:17PM -0500, Stefan Berger wrote: > > > Hello! > > > > > > The attached patch provides support for the Linux macvtap device for > > > Qemu by passing a file descriptor to Qemu command line similar to how > it > > > is done with a regular tap device. I have modified the network XML > code > > > to understand a definition as the following one here: > > > > > > <network> > > > <name>vepanet</name> > > > <uuid>4ebd5168-6321-4757-8397-f6e83484f402</uuid> > > > <extbridge mode='vepa' dev='eth0'/> > > > </network> > > > > I don't think this is the correct place to be adding this kind > > of configuration / functionality. The virNetworkPtr / <network> > > XML is describing a virtual network capability which is *not* > > directly connected to the LAN. It may be configured to route > > from the virtual network to the LAN, with optional NAT applied. > > So while the implementation may use a bridge device, this bridge > > is not connected to any physical device. Since VEPA is about > > directly connecting VMs to the LAN, this doesn't really fit here. > > Yes, I have re-purposed the network XML to describe an external bride. > > There's the following advantage to this: > > - you can migrate a VM between machines that have different types of > connectivity, i.e, tap and macvtap > > - pushing the eth0 into referenced XML makes it independent of the local > configuration of the host, i.e, > on the one host it may be eth0 and on the other eth1. eth0 in the above > XML could be a physical adapter, > or an SR-IOV physical adapter or virtual function of an SR-IOV adapter. I agree that those are both good advantages, but I'm still not liking the idea of re-purposing the network XML model for this. Unfortunately I don't yet have a clear alternative that satisfies those goals. I rather regret that the current stuff uses the name 'network' since it is somewhat misleading as to its purpose :-) The best idea I can come up with so far is to imagine a new "switch" object which would basically use the syntax you are suggesting as extension for the 'network" object, but without all the existing bits todo with NAT/routing/DHCP. A 'switch' object might be something that is also useful for the parallel work being done in firewall filters in libvirt. I don't think we neccessarily need to consider this mutually exclusive wrt the direct syntax I suggest for VMs. We could start with the direct syntax in VMs since that's pretty quick & easy to implement, and then introduce the idea of a 'switch' object later to give us an alternate host-independant config. > > In the context of bridging a guest to a plain ethernet device, these > > fit together as follows > > > > 1. The virNodeDevPtr APIs are used to discover what physical network > > devices exist, 'eth0' > > > > 2. The virInterfacePtr APIs are used to create a bridge on the host > > br0, containing the physical device 'eth0' > > > Yes, I suppose this is all done via 'virsh iface-*' commands. Yes, that's correct. > > So unless I'm missing something major in my reasoning here I think > > in the domain XML we end up with two possible configs for guest > > network interfaces > > > > > > 1. The current one using plain Linux software bridging, which > > we can't change in an incompatible way > > > > <interface type='bridge'/> > > <source bridge='br0'/> > > <target dev='vnet0'/> > > </interface> > > > > Here, the source device is a bridge previously setup > > to have a physical device enslaved (regular or SR-IOV) > > The target device is the plain TAP device > > plain TAP device -> no need for change here. > > > > > 2. A new one using hardware bridging, which we can freely > > define for our new needs > > > > <interface type='direct'/> > > <source dev='eth0' mode='vepa|pepa|bridge'/> > > <target dev='vnet0'/> > > </interface> > > In contrast to the ACLs ( :-) ), where I would regard the ACLs as > VM-attached data that ideally would migrate along when the VM migrates > between hosts, in the case of this network attachment I'd not put > host-specific information in the domain XML as is the case here with the > 'eth0'. Who knows, maybe it's going to be the SR-IOV virtual adapter eth10 > on the destination side? With the redirection into the network XML (or > similar) one could define a network XML per VM, create that with > host-specific information on the destination, i.e., eth10, and then > migrate the VM previously linked to eth0 via macvtap that then connected > via eth10. It's more work for upper layers, but if there is a need for > optimization for throughput, then maybe that's the only way that > optimizations can be done. Otherwise if all VMs in the data center are > created with above XML and eth0 then they will all need to stay on eth0 I > suppose. > In this context, how will the virtual functions of SR-IOV be administered > and given to VMs. I suppose their management would be left up to higher > layers? As a general rule we leave policy decisions to the management apps and merely provide them mechanism to implement their desired policy. > > > > > Here, source device is a physical device (regular or > > SR-IOV). The target device is a macvtap device. > > > > In both cases the TAP or macvtap device is created on the fly when the > > VM is booted & destroyed at shutdown (either by the kernel, or manually > > by libvirt for macvtap). > > Yes, as long as libvirt is running when the VM goes down it can delete the > macvtap device. If not, I am trying to delete all macvtap devices at VM > startup using the MAC address of the VM (which the macvtap inherits) as > search/delete criterion. That is more than sufficient - we already assume libvirtd is running at time of guest shutdown . We don't officially support the scneario of a guest shutting down while libvirtd is stopped - just make best effort to cope. > > > > > > Index: libvirt/src/util/macvtap.c > > > =================================================================== > > > --- /dev/null > > > +++ libvirt/src/util/macvtap.c > > > @@ -0,0 +1,664 @@ > > > +/* > > > + * Copyright (C) 2010 IBM Corporation > > > + * > > > + * This library is free software; you can redistribute it and/or > > > + * modify it under the terms of the GNU Lesser General Public > > > + * License as published by the Free Software Foundation; either > > > + * version 2.1 of the License, or (at your option) any later version. > > > + * > > > + * This library is distributed in the hope that it will be useful, > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + * Lesser General Public License for more details. > > > + * > > > + * You should have received a copy of the GNU Lesser General Public > > > + * License along with this library; if not, write to the Free > Software > > > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA > 02111-1307 USA > > > + * > > > + * Authors: > > > + * Stefan Berger <stefanb@xxxxxxxxxx> > > > + */ > > > + > > > +#include <config.h> > > > + > > > +#if defined(WITH_MACVTAP) > > > > [snip]. > > > > I've not had time to look at the details of this macvtap.c code yet, > > but I assume its doing all you need :-) Is there any benefit to using > > the network libnl.so library, rather than the ioctl()'s directly ? > > > Haven't looked at that library and its API, but can do so if it's > documented. Would it be ok to keep the current implementation, though? I don't mind either way. I'll leave the decision upto you since you know more about this code than me :-) So if you prefer to use the current code that's fine. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list