Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

Stephen Hemminger <shemminger@xxxxxxxxxx> · Fri, 10 Feb 2012 08:39:17 -0800

On Fri, 10 Feb 2012 10:18:31 -0500
jamal <hadi@xxxxxxxxxx> wrote:

> Hi John,
> 
> I went backwards to summarize at the top after going through your email.
> 
> TL;DR version 0.1: 
> you provide a good use case where it makes sense to do things in the
> kernel. IMO, you could make the same arguement if your embedded switch
> could do ACLs, IPv4 forwarding etc. And the kernel bloats.
> I am always bigoted to move all policy control to user space instead of
> bloating in the kernel.
> 
>  
> On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote:
> 
> > > 
> > > Hi Jamal,
> > > 
> > > The user space app in this case would listen for FDB updates to the SW
> > > bridge and then mirror them at the embedded NIC. In this case it seems
> > > easier to just add a notifier chain and let the kernel keep these in
> > > sync. Otherwise we need a daemon in user space to replicate these.
> > > 
> 
> A user space daemon if you need to ensure synchronization. Thats what i
> meant when i said there was a "disadvantage" over the simple case when
> the goal is always to synchronize.
> 
> > > On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH,
> > > and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you
> > > would have one common interface to drive these. But the bridge already
> > > has this protocol/msgtype so that would require either some demux or
> > > new protocol/msgtype pairs to be created. 
> > > 
> 
> The bridge is very netlink friendly these days. Given the rest of the
> network stack (*NEIGH* you mention above) talks netlink to user space
> it should be workable. 
> 
> > > Let me think on it. I'm tempted by the simplicity of adding notifier
> > > hooks though.
> 
> If something is missing bridge-side it may need to be added (as Per
> Stephen's comment) - i just took it one further indicating those
> notifiers need to also netlink-speak
> 
> 
> > Actually because the bridge is adding/removing fdb entries dynamically
> > maybe its best this gets done in kernel. Here's the example case,
> 
> [..]
> 
> > 
> > With the flow by letters above hope this is not too difficult to follow.
> 
> > (A) veth0 a virtual device transmits packet destined for ethx.y
> > (B) SW bridge receives frames and updates FDB flooding to C
> > (C) eth0 the PF in this case sends the frame to the HW backed by the
> >     embedded bridge
> 
> Following so far.
> Can you have more than one PF per embedded switch? Or is the intent here
> purely to do VMs/VF separation?
> 
> > (D) The HW embedded switch has a static entry for ethx.y and forwards
> >     the frame to the VF or if its a broadcast frame also floods it to
> >     the wire and ethx.y
> 
> nod.
> 
> > (E) ethx.y receives the frame and generates a response to the dest mac of
> >     veth0
> 
> nod.
> Since you said in #D the entries in the switch are static, I am assuming
> at this point neither ethx.y nor veth0 exist in the embedded FDB.
> 
> > Now here is the potential issue,
> > 
> > (G) The frame transmitted from ethx.y with the destination address of
> >     veth0 but the embedded switch is not a learning switch. If the FDB
> >     update is done in user space its possible (likely?) that the FDB
> >     entry for veth0 has not been added to the embedded switch yet. 
> 
> Ok, got it - so the catch here is the switch is not capable of learning.
> I think this depends on where learning is done. Your intent is to
> use the S/W bridge as something that does the learning for you i.e in
> the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run.
> And that maybe the case for your use case.
> 
> What if I dont wanna run the S/W bridge at all?
> Ive been making a point that with a simple knob(Stephen doesn like to
> add such a knob), the SW bridge could defer learning to user space. 
> [This way you can add a lot of richness e.g on ACLs such as restricting
> what MAC addresses etc are allowed to talk to which ones etc.].
> But if bypass the s/w bridge all together and learn in user space
> or have a static config in which i populate the embedded switch, i dont
> see the issue.
> 
> > Now
> >     we either have to flood the frame which is not horrible but not
> >     ideal or worse if the embedded switch does not support flooding send
> >     it to the wire and veth0 never receives it. 
> 
> If it is a switch it has to flood, no? Otherwise it sounds broken.
> 
> > If the SW bridge pushes
> >     the FDB update down into the embedded switch the address is for
> >     sure in the embedded switches forwarding tables and the switching
> >     works as expected.
> 
> Yes, there is a small gap between the s/w bridge learning and the
> synchronization happening to the embedded nic switch. That gap gets
> larger if you defer learning to user space. But like you said earlier,
> during that gap packets are flooded - and do you care if the
> synchronization doesnt happen immediately?
> 
> > So to handle this case correctly its probably best IMHO to use a notifier
> > hook. Having a RTM_GETNEIGH for the embedded switch implemented though
> > would be nice for dumping the FDB of the embedded switch and SET/DEL
> > could be used to configure the FDB when its not being driven by the SW
> > switch. Of course we should try to be minimalists here.
> 
> Do you need to have a different *NEIGH* than what we already have
> really?
> 
> The problem with putting policies in the kernel is you are gonna keep
> adding more. Bloat user space instead. 

Some related discussion points:
 * the bridge needs to support control from both userspace (MSTP, TRILL, ...)
   and kernel space (offload etc)
 * the bridge forwarding database is simpler and different than the existing
   neighbor table, don't remember the details but last time I checked it
   using neighbor table in bridge would be putting square peg in round hole.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html