On Thu, Jul 25, 2024 at 04:17:29PM +0300, Ido Schimmel wrote: > The TOS field in the IPv4 flow information structure ('flowi4_tos') is > matched by the kernel against the TOS selector in IPv4 rules and routes. > The field is initialized differently by different call sites. Some treat > it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as > RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC > 791 TOS and initialize it using IPTOS_RT_MASK. > > What is common to all these call sites is that they all initialize the > lower three DSCP bits, which fits the TOS definition in the initial IPv4 > specification (RFC 791). > > Therefore, the kernel only allows configuring IPv4 FIB rules that match > on the lower three DSCP bits which are always guaranteed to be > initialized by all call sites: > > # ip -4 rule add tos 0x1c table 100 > # ip -4 rule add tos 0x3c table 100 > Error: Invalid tos. > > While this works, it is unlikely to be very useful. RFC 791 that > initially defined the TOS and IP precedence fields was updated by RFC > 2474 over twenty five years ago where these fields were replaced by a > single six bits DSCP field. > > Extending FIB rules to match on DSCP can be done by adding a new DSCP > selector while maintaining the existing semantics of the TOS selector > for applications that rely on that. > > A prerequisite for allowing FIB rules to match on DSCP is to adjust all > the call sites to initialize the high order DSCP bits and remove their > masking along the path to the core where the field is matched on. > > However, making this change alone will result in a behavior change. For > example, a forwarded IPv4 packet with a DS field of 0xfc will no longer > match a FIB rule that was configured with 'tos 0x1c'. > > This behavior change can be avoided by masking the upper three DSCP bits > in 'flowi4_tos' before comparing it against the TOS selectors in FIB > rules and routes. > > Implement the above by adding a new function that checks whether a given > DSCP value matches the one specified in the IPv4 flow information > structure and invoke it from the three places that currently match on > 'flowi4_tos'. > > Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK > since the latter is not uAPI and we should be able to remove it at some > point. > > No regressions in FIB tests: > > # ./fib_tests.sh > [...] > Tests passed: 218 > Tests failed: 0 > > And FIB rule tests: > > # ./fib_rule_tests.sh > [...] > Tests passed: 116 > Tests failed: 0 > > Signed-off-by: Ido Schimmel <idosch@xxxxxxxxxx> > --- > include/net/ip_fib.h | 7 +++++++ > net/ipv4/fib_rules.c | 2 +- > net/ipv4/fib_semantics.c | 3 +-- > net/ipv4/fib_trie.c | 3 +-- > 4 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h > index 72af2f223e59..967e4dc555fa 100644 > --- a/include/net/ip_fib.h > +++ b/include/net/ip_fib.h > @@ -22,6 +22,8 @@ > #include <linux/percpu.h> > #include <linux/notifier.h> > #include <linux/refcount.h> > +#include <linux/ip.h> Why including linux/ip.h? That doesn't seem necessary for this change. Appart from that, Reviewed-by: Guillaume Nault <gnault@xxxxxxxxxx> Thanks a lot!