On 03/30, Jesper Dangaard Brouer wrote:
On 30/03/2023 01.19, Stanislav Fomichev wrote:
> On 03/29, Jesper Dangaard Brouer wrote:
>
> > On 29/03/2023 19.18, Stanislav Fomichev wrote:
> > > On 03/29, Jesper Dangaard Brouer wrote:
> > >
> > > > On 28/03/2023 23.58, Stanislav Fomichev wrote:
> > > > > On 03/28, Jesper Dangaard Brouer wrote:
> > > > > > The RSS hash type specifies what portion of packet data NIC
hardware used
> > > > > > when calculating RSS hash value. The RSS types are focused on
Internet
> > > > > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP)
often get hash
> > > > > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6,
and L4
> > > > > > primarily TCP vs UDP, but some hardware supports SCTP.
> > > > >
> > > > > > Hardware RSS types are differently encoded for each hardware
NIC. Most
> > > > > > hardware represent RSS hash type as a number. Determining L3
vs L4 often
> > > > > > requires a mapping table as there often isn't a pattern or
sorting
> > > > > > according to ISO layer.
> > > > >
> > > > > > The patch introduce a XDP RSS hash type (xdp_rss_hash_type)
that can both
> > > > > > be seen as a number that is ordered according by ISO layer,
and can be bit
> > > > > > masked to separate IPv4 and IPv6 types for L4 protocols. Room
is available
> > > > > > for extending later while keeping these properties. This maps
and unifies
> > > > > > difference to hardware specific hashes.
> > > > >
> > > > > Looks good overall. Any reason we're making this specific
layout?
> > >
> > > > One important goal is to have a simple/fast way to determining L3
vs L4,
> > > > because a L4 hash can be used for flow handling (e.g.
load-balancing).
> > >
> > > > We below layout you can:
> > >
> > > > if (rss_type & XDP_RSS_TYPE_L4_MASK)
> > > > bool hw_hash_do_LB = true;
> > >
> > > > Or using it as a number:
> > >
> > > > if (rss_type > XDP_RSS_TYPE_L4)
> > > > bool hw_hash_do_LB = true;
> > >
> > > Why is it strictly better then the following?
> > >
> > > if (rss_type & (TYPE_UDP | TYPE_TCP | TYPE_SCTP)) {}
> > >
>
> > See V2 I dropped the idea of this being a number (that idea was not a
> > good idea).
>
> 👍
>
> > > If we add some new L4 format, the bpf programs can be updated to
support
> > > it?
> > >
> > > > I'm very open to changes to my "specific" layout. I am in doubt
if
> > > > using it as a number is the right approach and worth the trouble.
> > >
> > > > > Why not simply the following?
> > > > >
> > > > > enum {
> > > > > ����XDP_RSS_TYPE_NONE = 0,
> > > > > ����XDP_RSS_TYPE_IPV4 = BIT(0),
> > > > > ����XDP_RSS_TYPE_IPV6 = BIT(1),
> > > > > ����/* IPv6 with extension header. */
> > > > > ����/* let's note ^^^ it in the UAPI? */
> > > > > ����XDP_RSS_TYPE_IPV6_EX = BIT(2),
> > > > > ����XDP_RSS_TYPE_UDP = BIT(3),
> > > > > ����XDP_RSS_TYPE_TCP = BIT(4),
> > > > > ����XDP_RSS_TYPE_SCTP = BIT(5),
> > >
> > > > We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive,
they
> > > > cannot be set at the same time, e.g. as a packet cannot both be
UDP and
> > > > TCP. Thus, using these bits as a number make sense to me, and is
more
> > > > compact.
See below, why I'm wrong (in storing this as numbers).
> > >
> > > [..]
> > >
> > > > This BIT() approach also have the issue of extending it later
(forward
> > > > compatibility). As mentioned a common task will be to check if
> > > > hash-type is a L4 type. See mlx5 [patch 4/4] needed to extend
with
> > > > IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits
that this
> > > > can be extended with new L4 types, such that existing progs will
still
> > > > work checking for L4 check. It can of-cause be solved in the
same way
> > > > for this BIT() approach by reserving some bits upfront in a mask.
> > >
> > > We're using 6 bits out of 64, we should be good for awhile? If there
> > > is ever a forward compatibility issue, we can always come up with
> > > a new kfunc.
>
> > I want/need store the RSS-type in the xdp_frame, for XDP_REDIRECT and
> > SKB use-cases. Thus, I don't want to use 64-bit/8-bytes, as xdp_frame
> > size is limited (given it reduces headroom expansion).
>
> > >
> > > One other related question I have is: should we export the type
> > > over some additional new kfunc argument? (instead of abusing the
return
> > > type)
>
> > Good question. I was also wondering if it wouldn't be better to add
> > another kfunc argument with the rss_hash_type?
>
> > That will change the call signature, so that will not be easy to
handle
> > between kernel releases.
>
> Agree with Toke on a separate thread; might not be too late to fit it
> into an rc..
>
> > > Maybe that will let us drop the explicit BTF_TYPE_EMIT as well?
>
> > Sure, if we define it as an argument, then it will automatically
> > exported as BTF.
>
> > > > > }
> > > > >
> > > > > And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs
> > > > > XDP_RSS_TYPE_IPV6|XXX ?
> > >
> > > > Do notice, that I already does some level of or'ing ("|") in this
> > > > proposal. The main difference is that I hide this from the
driver, and
> > > > kind of pre-combine the valid combination (enum's) drivers can
select
> > > > from. I do get the point, and I think I will come up with a
combined
> > > > solution based on your input.
> > >
> > >
> > > > The RSS hashing types and combinations comes from M$ standards:
> > > > [1]
https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations
> > >
> > > My main concern here is that we're over-complicating it with the
masks
> > > and the format. With the explicit bits we can easily map to that
> > > spec you mention.
>
> > See if you like my RFC-V2 proposal better.
> > It should go more in your direction.
>
> Yeah, I like it better. Btw, why have a separate bit for XDP_RSS_BIT_EX?
Yes, we can rename the EX bit define (which is in V2). I reduced the
name-length, because it allowed to keep code on-one-line when OR'ing.
> Any reason it's not a XDP_RSS_L3_IPV6_EX within XDP_RSS_L3_MASK?
>
Hmm... I guess it belongs with L3.
Do notice that both IPv4 and IPv6 have a flexible header called either
options/extensions headers, after their fixed header. (Mlx4 HW contains
this
info for IPv4, but I didn't extend xdp_rss_hash_type in that patch).
Thus, we could have a single BIT that is valid for both IPv4 and IPv6.
(This can help speedup packet parsing having this info).
A separate bit for both v4/v6 sounds good. But thinking more about it,
not sure what the users are supposed to do with it. Whether the flow is
hashed
over the extension header should a config option, not a per-packet signal?
[...]
>
> > > For example, for forward compat, I'm not sure we can assume that
the people
> > > will do:
> > > "rss_type & XDP_RSS_TYPE_L4_MASK"
> > > instead of something like:
> > > "rss_type & (XDP_RSS_TYPE_L4_IPV4_TCP|
XDP_RSS_TYPE_L4_IPV4_UDP)"
> > >
>
> > This code is allowed in V2 and should be. It is a choice of
> > BPF-programmer in line-2 to not be forward compatible with newer L4
> > types.
>
The above code made me realize, I was wrong and you are right, we should
represent the L4 types as BITs (and not as numbers).
Even-though a single packet cannot be both UDP and TCP at the same time,
then it is reasonable to have a code path that want to match both UDP
and TCP. If L4 types are BITs then code can do a single compare (via
ORing), while if they are numbers then we need more compares.
Thus, I'll change scheme in V3 to use BITs.
So you are saying that the following:
if (rss_type & (TCP|UDP)
is much faster than the following:
proto = rss_type & L4_MASK;
if (proto == TCP || proto == UDP)
?
idk, as long as we have enough bits to represent everything, I'm fine
with either way, up to you. (not sure how much you want to constrain the
data
to fit it into xdp_frame; assuming u16 is fine?)
> > > > > > This proposal change the kfunc API
> > bpf_xdp_metadata_rx_hash() > > > > to return this RSS hash type on
> > success.
>
> > This is the real question (as also raised above)...
> > Should we use return value or add an argument for type?
>
> Let's fix the prototype while it's still early in the rc?
Okay, in V3 I will propose adding an argument for the type then.
SG, thx!
> Maybe also extend the tests to drop/decode/verify the mask?
Yes, I/we obviously need to update the selftests.
One problem with selftests is that it's using veth SKB-based mode, and
SKB's have lost the RSS hash info and converted this into a single BIT
telling us if this was L4 based. Thus, its hard to do some e.g. UDP
type verification, but I guess we can check if expected UDP packet is
RSS type L4.
Yeah, sounds fair.
In xdp_hw_metadata, I will add something that uses the RSS type bits. I
was thinking to match against L4-UDP RSS type as program only AF_XDP
redirect UDP packets, so we can verify it was a UDP packet by HW info.
Or maybe just dump it, idk.
--Jesper