Hi Or, On Sun, Oct 14, 2018 at 02:24:42PM +0300, Or Gerlitz wrote: > On Sun, Oct 14, 2018 at 12:24 PM Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > On Sun, Oct 14, 2018 at 09:42:42AM +0300, Or Gerlitz wrote: [...] > > > for udp based tunneling this is valid practice, b/c a driver can get the tunnel > > > type <--> udp dest port relation from the stack through the udp tunnel ndo. > > > > > > HW offloading wise, I think it would be better first pursue Januk and Co > > > proposal which deals with the problematic part of tc/tunnels -- ingress rules > > > set on tunnel devices (decap rules). The RFC looks very promising and > > > I understand this is going to be reposed as non-rfc early in the next cycle > > > > Sorry, I think there's a misunderstanding here. > > > > My patchset is of benefit for pure software/control plane approach: > > This allows users to narrow down tunnel configurations, existing > > approach is very flexible - probably a bit too much for people willing > > to validate things are correct - since it is allowing too loose > > configurations. This is leaving room to the user to make configuration > > mistakes that result in bad things, such as packets being tunneled > > through the wrong encapsulation type. > > I see your point, but I think that the fact that effectively all or most > of the IP based tunnels in the kernel conform to struct ip_tunnel_info > is a plus and not a minus.. control planes can do their mistakes if they > don't read the man, this wrong tunneling that you mentioned is not going > to live far beyond the next networking hop or even dropped earlier, within the > stack of this node. I think it is worse that just 'please read the manpage', look: The tunnel ID field is 64 bits, but it is trimmed down to 32 bits for VxLAN, and then it is 8 bits in case of ERSPAN for the session field. Semantics depend on the tunnel protocol. Depending on the tunneling protocol, the ID have different tunnel header field length. The control plane cannot reject a 0xff00ffff in ERSPAN because the ip_tunnel_info structure has no tunnel type semantics. I think it would be good if users get some sort of 'sorry, you cannot specify tunnel ID larger that 8 bits in ERSPAN'. Another example, you can also specify via tc/ingress a rule like: act_tunnel_key id 50 src-addr 192.168.2.1 with no fwd/mirred action in a bridge setup. The tunnel device is part of the bridge in this case, and the template passes the configuration to the tunnel device that is part of the bridge. So we cannot assume the fwd/mirred action always follow act_tunnel_key in software. Probably in HW offload it makes sense to always follow it up with fwd/mirred to keep things simple, but we already have scenarios in software where this is not the case. > > through the wrong encapsulation type. With this patchset, packets > > going to the wrong tunnel devices will be simply dropped - and we > > could even do more specific validation from control plane after this. > > This patchset is backward compatible, so it doesn't restrict for the > > existing flexibility that users may want for this. > > > > This patchset _never_ meant to replace Jakub's work nor any HW offload > > infrastructure. After reading Jakub's email, I was just suggesting > > that this may (probably) still help drivers too, since this would > > provide more hints to the driver. > > well, yes, more hints and sort of no (see next) > > > provide more hints to the driver. Please, let me know if this is > > causing any interference with your ongoing HW driver development in > > some way, and if so, in what way. > > for example, if newer controller wants to work over older kernel that doesn't > have the new flag, they have to write code that can go both ways This is backward compatible, your controller can keep using the existing approach forever. Anyway, I think the problem you're refering about older kernel and new controller is an interesting one but a different problem: This is an existing limitation in netlink. Currently, there is not way to know what the API supports other than doing probing, and probing is something may not even help. I already proposed something to address this problem in case you're interested [1]. > , this is doable > -- but do we want to do that just for the sake of reacting to user > mis-configurations? > > I am open / will be happy to hear more opinions here. Please note that this new feature is optional, for people willing to have an interface/control plane that can validate what they are configuring can be good. Users tend to make mistakes, manpage is last resort, well if control plane can help / provide hints on what is wrong, things become easier for users. Thanks for your feedback! [1] https://lwn.net/Articles/746776/