Re: bridge + SR-IOV guests with KVM

Laine Stump <laine@xxxxxxxxxx> · Tue, 1 Sep 2020 23:42:52 -0400

On 8/27/20 4:12 AM, Philipp Rosenberger wrote:
Hi,

I managed to get SR-IOV with an Intel I350 NIC to work.For this I 
followed the documentation on this page:
https://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition 

But as I have more VMs then VF on the NIC I also have a bridge wich 
serves the other guests. As I run Debian Buster as host I followed the 
documentation here:
https://wiki.libvirt.org/page/Networking#Debian.2FUbuntu_Bridging

Do I correctly assume that you're attaching the PF of the SRIOV card to 
the bridge? (in particular, the PF of the same port that the VFs are from)

If I use only the SRIOV everything works as expected. All guests can be 
reached for the network and the guests and host can reach each other. 
The same goes for a sole bridged environment.

I think you missed a paragraph here - I'm inferring that at this point 
you meant to say "But when I have both guests connected with an assigned 
VF and guests connected via an virtio-net device connected to the bridge 
via a tap device, the VF-connect guests cannot communicate with the 
bridge-connected guests." Is that correct, or am I inferring too much?

As I dived into the issue I found an answer form the intel community:
https://community.intel.com/t5/Ethernet-Products/82599-VF-to-Linux-host-bridge/td-p/351802 

By default all the ports on a Linux host bridge should have flood and 
learning turned on, and I would have thought that (if manually adding a 
mac address to the linux host bridge is enough to make the traffic flow) 
that having flood+learning turnes on would be enough to get the bridge 
to learn the proper port for traffic destined to the VF

Really, my first assumption would have been that the switch screwing up 
was the switch in the SRIOV card incorrectly sending traffic for the 
bridged guests directly out the PF's physical port instead of rather 
than the  Linux host bridge, and that the solution to make everything 
work would be to somehow add the MAC address of the *bridged* guests 
into the fdb of the SRIOV card.

[at this point I read further through your message and follow the link 
to the slides...]

Ooohh..... from slide 33, I see that *is* what's being done. It sounds 
like that's what you *are* doing - adding an fdb entry to the internal 
switch in the SRIOV card, *not* to the Linux host bridge (as I had 
thought right up until 3 paragraphs ago), correct?

It again is interesting though - makes it sound like the switch in the 
SRIOV card has no learning and no flood.

It says I need to add the "VF mac addresses and eth0 mac address to 
bridge forwarding database".

That definitely is what's said in the comments from the Intel forum. But 
isn't that the opposite of what they're saying on slide 33 of the 
presentation you linked to? My understanding is that it's saying that 
you need to add the *bridged* guest interface's MAC address to the fdb 
on the SRIOV card's switch.

I have done this with the following command:
bridge fdb add 52:54:00:3c:1c:e6 dev eth_lan0

The mac address is from my VM which is on the bridge. And the eth_lan0 
interface is the physical interface of my bridge and also the PF of my 
I350 NIC.

Ah, okay, this verifies my first assumption (that the bridge is attached 
to the PF). It also verifying that you did what's suggested in slide 33 
of the presentation, not what was suggested in the Intel forum post. Is 
that correct?

This seems to work. But doing this manually is annoying and a bit of a 
hassle when creating new VMs on the bridge.

I there a way to let libvirt do this work?

Well, if you create a libvirt network for your bridge device, something 
like this:

  <network>
    <name>bridgenet</name>
    <bridge name='br0'/>
    <forward mode='bridge'/>
  </network>

(this is an "unmanaged" network, i.e. libvirt expects the bridge to 
already exist, doesn't add any iptables rules or dnsmasq instance - it 
just creates tap devices and connects them to the already-existing 
bridge), and then configure all your guest interfaces with:

    <interface type='network'>
      <source network='bridgenet'/>
      ...
    </interface>

then a network hook will be called each time one of these interfaces is 
added or removed, and that script can add/remove the fdb entry. Hooks 
are described here:

  https://libvirt.org/hooks.html

If you have an executable file named /etc/libvirt/hooks/network (or a 
file of your own naming in /etc/libvirt/hooks/network.d if your libvirt 
is 6.5.0 or newer) it will be called anytime a guest interface is added 
or removed from a libvirt network. The arguments will describe the 
current action, and stdin of the script will receive the full XML config 
of the network, as well as the xml for a <networkport>, which contains 
the interface's MAC address among other things. This should be enough 
information to derive the proper "bridge fdb add" command. (you'll want 
a similar clause in the same hook script that takes action when the 
interface is removed from the network).

If you have trouble with the script, you can look in the #virt channel 
on irc.oftc.net - if it's a weekday and the right time of day, you may 
get immediate help (I'm actually curious to hear how this works out. It 
*almost* sounds like something that could be integrated into the 
standard config of libvirt, although I'm not sure how to make it easily 
consumable).

(NB: you could also use the network hook script to perform some action 
when the VFs are added/removed from guests).

Best regards
Philipp

PS:
This presentation shows it pretty well on page 33:
https://events.static.linuxfound.org/sites/events/files/slides/LinuxConJapan2014_makita_0.pdf