Re: [PATCHv2 0/9] Let libvirt manage a bridge's FDB

Laine Stump <laine@xxxxxxxxx> · Thu, 04 Dec 2014 03:56:53 -0500

On 12/03/2014 12:02 PM, Daniel P. Berrange wrote:
> On Tue, Dec 02, 2014 at 12:08:30PM -0500, Laine Stump wrote:
>> (Everyone - see the request for opinions/ideas towards the bottom)
>>
>> The idea behind these patches is the following:
>>
>> 1) most virtual machines only have a single MAC address behind each
>> interface, and that MAC address is known by libvirt.
>>
>> 2) If we (i.e. libvirt) manually add an entry to the bridge's
>> forwarding database (fdb) for the MAC address associated with a port
>> on the bridge, we can turn off learning and unicast_flooding for that
>> port.
>>
>> 3) kernels starting with 3.15 (and actually working correctly starting
>> in kernel 3.17) will notice that all of a bridge's ports have flood
>> and learning turned off, and in that case will turn off promiscuous
>> mode on all ports. If all but one of the ports have flood/learning
>> turned off, then promiscuous will be turned off on that port (and left
>> on for all the other ports)
>>
>> 4) When (4) can be done, there is a measurable performance
>> advantage. It can also *kind of* help security, as it will prevent a
>> guest from doing anything useful if it changes its MAC address (but
>> won't prevent the guest from *sending* packets with a spoofed MAC
>> address).
>>
>> NB: These only work with a fixed MAC address, and no vlan tags set in
>> the guest. Support for both of those will be coming.
> So just to be clear on the functional differences between the
> impls
>
> With the current impl
>
>  1. The bridge has a table mapping MAC <-> TAP devices
>     which is initially empty
>  2. Whenever a TAP device sends a packet, it causes an
>     entry to be added to the MAC <-> TAP table
>  3. When the bridge needs to forward a packet with a MAC
>      - If MAC is unknown, it is sent to all TAP devices
>      - Else it is send to the specific TAP device
>  4. The physical device is always promiscuous to see all
>     traffic on the LAN
>
> An result of point 2, is that if the guest changes its
> MAC address on the fly, it merely needs to send a packet
> and the bridge will learn its new MAC address. That said
> our firewall rules can potentially block this dynamic
> change.
>
>
> With the new impl
>
>  1. The bridge has a table mapping MAC <-> TAP devices
>     which is initially empty
>  2. Libvirt tells the bridge what MAC address is associated
>     with each TAP device added
>  3. The bridge always forwards the packets to the correct
>     TAP device
>  4. The physical device is never promiscuous, it is
>     programmed with MACs of all TAP devices
>
> In both cases the guest can send packets with spoofed
> MAC addresses. In the old code these packets would make
> it onto the LAN and it can receive replies back. In the
> new code these packets would make it onto the LAN but
> it will never receive replies back.
>
>
> At a high level the conceptual difference is whether the table
> of MAC addresses & TAP devices is statically defined by libvirt
> or dynamically defined on the fly.
>
>> HERE IS THE REQUEST FOR OPINIONS/IDEAS:
>>
>> This V2 of the patchset addresses several issues brought up by jferlan
>> on the original series, and changes the name of the attribute from:
>>
>>    promiscLinks='yes|no'
>>
>> to
>>
>>    fdb='learningWithFlood|managed'
>>
>> I'm somewhat more happy with this new naming than the previous but
>> still looking for better ideas. It is closer to describing what the
>> new code really does, but "learningWithFlood" seems a bit long and
>> awkward, while I have been told that "fdb" is too short and
>> unrecognizeable (I will point out that 1) "fdb" is the same name used
>> by iproute2's "bridge" command, and 2) another bridge option, "stp" is
>> also a three letter acronym that will only be recognized by those
>> familiar with configuring an L2 bridge device or watching NASCAR on
>> Saturday afternoons (or whenever it's on - not a fan myself :-))
> One thing to remember is that libvirt is supposed to be providing a
> higher level configuration language that is independant of specific
> implementations.  'fdb' is terminology specific to the Linux
> bridge implementation, so I don't think that's appropriate to use
> in the XML configuration.

Good point. I'd been seeing this as a generic term, but I guess it isn't.

> There is a user visible change in behaviour because this new solution,
> as implemented in this patch series, will no longer allow a guest to
> change its MAC address on the fly.
>
> Of course if QEMU can provide a notification to libvirt when the guest
> changes its MAC address, then libvirt can update the MAC table and so
> re-gain functional equivalance with the old solution. I think we would
> want to be able to turn that on or off though.

For macvtap interfaces, we can already do that using
trustGuestRxFilters='yes' (a name I don't really like in a location I
don't really like (attribute of toplevel <interface> element), but
couldn't come up with anything better and didn't get any other
suggestions at the time). This has been in libvirt since last month's
release. My thought was that if trustGuestRxFilters is yes and the
interface is on a bridge, we can then honor any changes to mac address.

(This is where my dislike of my own choice for name of that option comes
in - I can now see a need to allow the guest to change its vlan table or
multicast table and honor those changes, while not allowing it to change
its MAC address, or vice versa; it probably would have been better to
add a new element called <rxFilter> or something, and have a separate
attribute for trusting each piece. It has now been in one release,
though, so I guess we're stuck with it.)

>
> This all leads me to suggest that we should use the following syntax
> that is indepedant of the particular implementation choice, and instead
> represents the logical behaviour of the feature.
>
>   macTable=static|dynamic
>
> The mode of 'static' means that the MAC address table will always
> match MACs recorded in libvirt guest XML. This is the new mode
> that your patches implement
>
> The mode of 'dynamic' means that the MAC address table will be
> able to dynamically update itself as the guest changes MAC addresses.
> In the near term, this will use the traditional bridge learning
> mode. In the long term, if we can get MAC change notifications from
> QEMU, we can switch this over to use the new bridge features.

I do think that eventually libvirt should make self-management of the
mac table the default, but even then I think we need to retain the
ability to disable it and let the kernel do as it may, just in case
there are compatibility problems if for no other reason.

Also, I think that enabling/disabling guest-initiated MAC address
changes should be possible at the per-interface level, not (just)
per-network. This can be done with the above-mentioned
trustGuestRxFilters attribute (which can be set in a single interface,
or for a network or network portgroup).

So I like naming the attribute "macTable", as that is more generic, but
I'm not sure about the values "static" and "dynamic", because "dynamic"
could describe two different things (dynamically changed by the kernel
via learning and flood, or dynamically changed via libvirt being an
omniscient hypervisor god that knows what to plug into the table), and
"static" is something that can already be described via setting
trustGuestRxFilters='no' combined with a "let libvirt control the mac
table" option.

Maybe the problem is that you're providing a good name based entirely on
the differences in behavior the guests can see, but what I'm looking for
is a way to control how that behavior is provided, so more of a backend
thing. (Note that if the "backend" is the bridge module, "static" isn't
possible now and never will be, while if the backend is explicit
management by libvirt, then "dynamic" isn't currently available, but
will be in the future).

(Hmm, I also wonder if there will ever be a situation where "macTable"
will be ambiguous due to the need to specify options for an rx-filter
mac table in addition to this table of forwarding next-hops)

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list