Hi John, I went backwards to summarize at the top after going through your email. TL;DR version 0.1: you provide a good use case where it makes sense to do things in the kernel. IMO, you could make the same arguement if your embedded switch could do ACLs, IPv4 forwarding etc. And the kernel bloats. I am always bigoted to move all policy control to user space instead of bloating in the kernel. On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote: > > > > Hi Jamal, > > > > The user space app in this case would listen for FDB updates to the SW > > bridge and then mirror them at the embedded NIC. In this case it seems > > easier to just add a notifier chain and let the kernel keep these in > > sync. Otherwise we need a daemon in user space to replicate these. > > A user space daemon if you need to ensure synchronization. Thats what i meant when i said there was a "disadvantage" over the simple case when the goal is always to synchronize. > > On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH, > > and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you > > would have one common interface to drive these. But the bridge already > > has this protocol/msgtype so that would require either some demux or > > new protocol/msgtype pairs to be created. > > The bridge is very netlink friendly these days. Given the rest of the network stack (*NEIGH* you mention above) talks netlink to user space it should be workable. > > Let me think on it. I'm tempted by the simplicity of adding notifier > > hooks though. If something is missing bridge-side it may need to be added (as Per Stephen's comment) - i just took it one further indicating those notifiers need to also netlink-speak > Actually because the bridge is adding/removing fdb entries dynamically > maybe its best this gets done in kernel. Here's the example case, [..] > > With the flow by letters above hope this is not too difficult to follow. > (A) veth0 a virtual device transmits packet destined for ethx.y > (B) SW bridge receives frames and updates FDB flooding to C > (C) eth0 the PF in this case sends the frame to the HW backed by the > embedded bridge Following so far. Can you have more than one PF per embedded switch? Or is the intent here purely to do VMs/VF separation? > (D) The HW embedded switch has a static entry for ethx.y and forwards > the frame to the VF or if its a broadcast frame also floods it to > the wire and ethx.y nod. > (E) ethx.y receives the frame and generates a response to the dest mac of > veth0 nod. Since you said in #D the entries in the switch are static, I am assuming at this point neither ethx.y nor veth0 exist in the embedded FDB. > Now here is the potential issue, > > (G) The frame transmitted from ethx.y with the destination address of > veth0 but the embedded switch is not a learning switch. If the FDB > update is done in user space its possible (likely?) that the FDB > entry for veth0 has not been added to the embedded switch yet. Ok, got it - so the catch here is the switch is not capable of learning. I think this depends on where learning is done. Your intent is to use the S/W bridge as something that does the learning for you i.e in the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run. And that maybe the case for your use case. What if I dont wanna run the S/W bridge at all? Ive been making a point that with a simple knob(Stephen doesn like to add such a knob), the SW bridge could defer learning to user space. [This way you can add a lot of richness e.g on ACLs such as restricting what MAC addresses etc are allowed to talk to which ones etc.]. But if bypass the s/w bridge all together and learn in user space or have a static config in which i populate the embedded switch, i dont see the issue. > Now > we either have to flood the frame which is not horrible but not > ideal or worse if the embedded switch does not support flooding send > it to the wire and veth0 never receives it. If it is a switch it has to flood, no? Otherwise it sounds broken. > If the SW bridge pushes > the FDB update down into the embedded switch the address is for > sure in the embedded switches forwarding tables and the switching > works as expected. Yes, there is a small gap between the s/w bridge learning and the synchronization happening to the embedded nic switch. That gap gets larger if you defer learning to user space. But like you said earlier, during that gap packets are flooded - and do you care if the synchronization doesnt happen immediately? > So to handle this case correctly its probably best IMHO to use a notifier > hook. Having a RTM_GETNEIGH for the embedded switch implemented though > would be nice for dumping the FDB of the embedded switch and SET/DEL > could be used to configure the FDB when its not being driven by the SW > switch. Of course we should try to be minimalists here. Do you need to have a different *NEIGH* than what we already have really? The problem with putting policies in the kernel is you are gonna keep adding more. Bloat user space instead. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html