On Tue, Jul 04, 2023 at 04:18:04PM +0200, Jesper Dangaard Brouer wrote: > > > On 04/07/2023 13.02, Larysa Zaremba wrote: > > On Tue, Jul 04, 2023 at 12:23:45PM +0200, Jesper Dangaard Brouer wrote: > > > > > > On 04/07/2023 10.23, Larysa Zaremba wrote: > > > > On Mon, Jul 03, 2023 at 01:15:34PM -0700, John Fastabend wrote: > > > > > Larysa Zaremba wrote: > > > > > > Implement functionality that enables drivers to expose VLAN tag > > > > > > to XDP code. > > > > > > > > > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@xxxxxxxxx> > > > > > > --- > > > > > > Documentation/networking/xdp-rx-metadata.rst | 8 +++++++- > > > > > > include/linux/netdevice.h | 2 ++ > > > > > > include/net/xdp.h | 2 ++ > > > > > > kernel/bpf/offload.c | 2 ++ > > > > > > net/core/xdp.c | 20 ++++++++++++++++++++ > > > > > > 5 files changed, 33 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst > > > > > > index 25ce72af81c2..ea6dd79a21d3 100644 > > > > > > --- a/Documentation/networking/xdp-rx-metadata.rst > > > > > > +++ b/Documentation/networking/xdp-rx-metadata.rst > > > > > > @@ -18,7 +18,13 @@ Currently, the following kfuncs are supported. In the future, as more > > > > > > metadata is supported, this set will grow: > > > > > > .. kernel-doc:: net/core/xdp.c > > > > > > - :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash > > > > > > + :identifiers: bpf_xdp_metadata_rx_timestamp > > > > > > + > > > > > > +.. kernel-doc:: net/core/xdp.c > > > > > > + :identifiers: bpf_xdp_metadata_rx_hash > > > > > > + > > > > > > +.. kernel-doc:: net/core/xdp.c > > > > > > + :identifiers: bpf_xdp_metadata_rx_vlan_tag > > > > > > An XDP program can use these kfuncs to read the metadata into stack > > > > > > variables for its own consumption. Or, to pass the metadata on to other > > > [...] > > > > > > diff --git a/net/core/xdp.c b/net/core/xdp.c > > > > > > index 41e5ca8643ec..f6262c90e45f 100644 > > > > > > --- a/net/core/xdp.c > > > > > > +++ b/net/core/xdp.c > > > > > > @@ -738,6 +738,26 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash, > > > > > > return -EOPNOTSUPP; > > > > > > } > > > > > > +/** > > > > > > + * bpf_xdp_metadata_rx_vlan_tag - Get XDP packet outermost VLAN tag with protocol > > > > > > + * @ctx: XDP context pointer. > > > > > > + * @vlan_tag: Destination pointer for VLAN tag > > > > > > + * @vlan_proto: Destination pointer for VLAN protocol identifier in network byte order. > > > > > > + * > > > > > > + * In case of success, vlan_tag contains VLAN tag, including 12 least significant bytes > > > > > > + * containing VLAN ID, vlan_proto contains protocol identifier. > > > > > > > > > > Above is a bit confusing to me at least. > > > > > > > > > > The vlan tag would be both the 16bit TPID and 16bit TCI. What fields > > > > > are to be included here? The VlanID or the full 16bit TCI meaning the > > > > > PCP+DEI+VID? > > > > > > > > It contains PCP+DEI+VID, in patch 16 ("selftests/bpf: Add flags and new hints to > > > > xdp_hw_metadata") this is more clear, because the tag is parsed. > > > > > > > > > > Do we really care about the "EtherType" proto (in VLAN speak TPID = Tag > > > Protocol IDentifier)? > > > I mean, it can basically only have two values[1], and we just wanted to > > > know if it is a VLAN (that hardware offloaded/removed for us): > > > > If we assume everyone follows the standard, this would be correct. > > But apparently, some applications use some ambiguous value as a TPID [0]. > > > > So it is not hard to imagine, some NICs could alllow you to configure your > > custom TPID. I am not sure if any in-tree drivers actually do this, but I think > > it's nice to provide some flexibility on XDP level, especially considering > > network stack stores full vlan_proto. > > > > I'm buying your argument, and agree it makes sense to provide TPID in > the call signature. Given weird hardware exists that allow people to > configure custom TPID. > > Looking through kernel defines (in uapi/linux/if_ether.h) I see evidence > that funky QinQ EtherTypes have been used in the past: > > #define ETH_P_QINQ1 0x9100 /* deprecated QinQ VLAN [ NOT AN OFFICIALLY > REGISTERED ID ] */ > #define ETH_P_QINQ2 0x9200 /* deprecated QinQ VLAN [ NOT AN OFFICIALLY > REGISTERED ID ] */ > #define ETH_P_QINQ3 0x9300 /* deprecated QinQ VLAN [ NOT AN OFFICIALLY > REGISTERED ID ] */ > > > > [0] > > https://techhub.hpe.com/eginfolib/networking/docs/switches/7500/5200-1938a_l2-lan_cg/content/495503472.htm > > > > > > > > static __always_inline int proto_is_vlan(__u16 h_proto) > > > { > > > return !!(h_proto == bpf_htons(ETH_P_8021Q) || > > > h_proto == bpf_htons(ETH_P_8021AD)); > > > } > > > > > > [1] https://github.com/xdp-project/bpf-examples/blob/master/include/xdp/parsing_helpers.h#L75-L79 > > > > > > Cc. Andrew Lunn, as I notice DSA have a fake VLAN define ETH_P_DSA_8021Q > > > (in file include/uapi/linux/if_ether.h) > > > Is this actually in use? > > > Maybe some hardware can "VLAN" offload this? > > > > > > > > > > What about rephrasing it this way: > > > > > > > > In case of success, vlan_proto contains VLAN protocol identifier (TPID), > > > > vlan_tag contains the remaining 16 bits of a 802.1Q tag (PCP+DEI+VID). > > > > > > > > > > Hmm, I think we can improve this further. This text becomes part of the > > > documentation for end-users (target audience). Thus, I think it is > > > worth being more verbose and even mention the existing defines that we > > > are expecting end-users to take advantage of. > > > > > > What about: > > > > > > In case of success. The VLAN EtherType is stored in vlan_proto (usually > > > either ETH_P_8021Q or ETH_P_8021AD) also known as TPID (Tag Protocol > > > IDentifier). The VLAN tag is stored in vlan_tag, which is a 16-bit field > > > containing sub-fields (PCP+DEI+VID). The VLAN ID (VID) is 12-bits > > > commonly extracted using mask VLAN_VID_MASK (0x0fff). For the meaning > > > of the sub-fields Priority Code Point (PCP) and Drop Eligible Indicator > > > (DEI) (formerly CFI) please reference other documentation. Remember > > > these 16-bit fields are stored in network-byte. Thus, transformation > > > with byte-order helper functions like bpf_ntohs() are needed. > > > > > > > AFAIK, vlan_tag is stored in host byte order, this is how it is in skb. > > I'm not sure we should follow SKB storage scheme for XDP. > I think following SKB convention is a good idea in this particular case. As I have mentioned below, in ice VLAN TCI in descriptor already comes in LE, so no point in converting it into BE, so somebody would use bpf_ntohs() later anyway. We are not the only manufacturer that does this. > > In ice, we receive VLAN tag in descriptor already in LE. > > Only protocol is BE (network byte order). So I would replace the last 2 > > sentences with the following: > > > > vlan_tag is stored in host byte order, so no byte order conversion is needed. > > Yikes, that was unexpected. This needs to be heavily documented in docs. You mean the motivation, why it is so and not the other way around? > > When parsing packets, it is in network-byte-order, else my code is wrong > here[1]: > > [1] https://github.com/xdp-project/bpf-examples/blob/master/include/xdp/parsing_helpers.h#L122 > > I'm accessing the skb->vlan_tci here [2], and I notice I don't do any > byte-order conversions, so fortunately I didn't make a code mistake. > > [2] https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/edt_pacer_vlan.c#L215 > In raw packet, VLAN TCI is in network byte order, but skb requires NIC/driver to convert it into host byte order before putting it into skb. > > vlan_proto is stored in network byte order, the suggested way to use this value: > > > > vlan_proto == bpf_htons(ETH_P_8021Q) > > > > > > > > > > --Jesper >