On Mon, 2011-08-22 at 05:17 -0400, Laine Stump wrote: > For some reason beyond my comprehension, the designers of SRIOV ethernet > cards decided that the virtual functions (VF) of the card (each VF > corresponds to an ethernet device, e.g. "eth10") should each be given a > new+different+random MAC address each time the hardware is rebooted. I read this is to avoid wasting MAC addresses from the vendor's pool which might never be used > Normally, udev keeps a persistent table that associates each known MAC > address with an ethernet device name - any time an ethernet device with > a previously-unknown MAC address is found, a new device name is > allocated ("eth11", etc) and the newly found MAC address is associated > with that device name. When an ethernet device is an SRIOV VF, though, > udev doesn't persist the MAC address, so at each boot a device is found > with a new MAC addres, but the device name from the previous boot is > "unused" so magically the device ends up with the same name even though > the MAC address has changed. RHEL 6.1 seems to use the PCI id to manage the inteface name in /etc/udev/rules.d/70-persistent-net.rules: # PCI device 0x8086:0x10ed (ixgbevf) SUBSYSTEM=="net", ACTION=="add", ATTR{dev_id}=="0x0", KERNELS=="0000:15:10.0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth8" > When this device is assigned to a guest via PCI passthrough, though, the > guest doesn't have the necessary information to realize that it's > actually an SRIOV VF, so the guest's udev persists the MAC address - on > the first boot of host+guest, the guest will see it has, e.g., mac > address 11:22:33:44:55:66 and udev will add an entry to its persistent > table remembering that 11:22:33:44:55:66="eth0". If the host reboots, > though, the VF will get a new MAC address, and when the guest boots, it > will see a new MAC address (e.g. "66:55:44:33:22:11") and think that > there's a different card, so it will create a new device (and a new udev > entry - 66:55:44:33:22:11="eth1"). This will repeat each time the host > reboots, with the obvious undesired consequences. > > This makes using SRIOV VFs via PCI passthrough very unpalatable. The > problem can be solved by setting the MAC address of the ethernet device > prior to assigning it to the guest, but of course the <hostdev> element > used to assign PCI devices to guests has no place to specify a MAC > address (and I'm not sure it would be appropriate to add something that > function-specific to <hostdev>). Dave Allan and I have discussed a > different possible method of eliminating this problem (using a new > forward type for libvirt networks) that I've outlined below. Please let > me know what you think - is this reasonable in general? If so, what > about the details? If not, any counter-proposals to solve the problem? > > Providing Predictable/Configurable MAC Addresses for SRIOV VFs used via > PCI Passthrough: > > 1) <network> will have a new forward type='hardware'. When forward > type='hardware', a pool of ethernet interfaces can be specified, just as > for the forward types "bridge", "vepa", "private", and "passthrough". At > this point, that's the only thing that I've determined is needed in the > network definition. type='hostdev'? > > 2) In a domain's <interface> definition, when type='network', if the > network has a forward type='hardware', the domain code will request an > unused ethernet device from the network driver, then do the following: > > 3) save the ethernet device name in interface/actual so that it can be > easily retrieved if libvirtd is restarted > > 4) Set the MAC address of the given ethernet device according to the > domain <interface> config. > > 5) Use the NodeDevice API to learn all the necessary PCI > domain/slot/bus/function and add a (non-persisting) <hostdev> element to > the guest's config before starting it up. > > 6) When the guest is eventually destroyed, the ethernet device will be > free'd back to the network pool for use by another guest. > > One problem this doesn't solve is that when a guest is migrated, the PCI > info for the allocated ethernet device on the destination host will > almost surely be different. Is there any provision for dealing with this > in the device passthrough code? If not, then migration will still not be > possible. > > Although I realize that many people are predisposed to not like the idea > of PCI passthrough of ethernet devices (including me), it seems that > it's going to be used, so we may as well provide the management tools to > do it in a sane manner. If I understand this correctly, this outlines an "implicit" pci passthrough and there is no need to provide an explicit <hostdev/> element in the domain xml. Guest configs using an explicit <hostdev/> element would still expose the problem outlined above, correct? Any plans for those? > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list