Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 31/10/2022 23.55, Stanislav Fomichev wrote:
On Mon, Oct 31, 2022 at 3:38 PM Yonghong Song<yhs@xxxxxxxx>  wrote:

On 10/31/22 3:09 PM, Stanislav Fomichev wrote:
On Mon, Oct 31, 2022 at 12:36 PM Yonghong Song<yhs@xxxxxxxx>  wrote:

On 10/31/22 8:28 AM, Toke Høiland-Jørgensen wrote:
"Bezdeka, Florian"<florian.bezdeka@xxxxxxxxxxx>  writes:

On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote:
On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote:
[...]
All parts of my application (BPF program included) should not be
optimized/adjusted for all the different HW variants out there.
Yes, absolutely agreed. Abstracting away those kinds of hardware
differences is the whole*point*  of having an OS/driver model. I.e.,
it's what the kernel is there for! If people want to bypass that and get
direct access to the hardware, they can already do that by using DPDK.

So in other words, 100% agreed that we should not expect the BPF
developers to deal with hardware details as would be required with a
kptr-based interface.

As for the kfunc-based interface, I think it shows some promise.
Exposing a list of function names to retrieve individual metadata items
instead of a struct layout is sorta comparable in terms of developer UI
accessibility etc (IMO).
>>>> Looks like there are quite some use cases for hw_timestamp.
Do you think we could add it to the uapi like struct xdp_md?

The following is the current xdp_md:
struct xdp_md {
           __u32 data;
           __u32 data_end;
           __u32 data_meta;
           /* Below access go through struct xdp_rxq_info */
           __u32 ingress_ifindex; /* rxq->dev->ifindex */
           __u32 rx_queue_index;  /* rxq->queue_index  */

           __u32 egress_ifindex;  /* txq->dev->ifindex */
};

We could add  __u64 hw_timestamp to the xdp_md so user
can just do xdp_md->hw_timestamp to get the value.
xdp_md->hw_timestamp == 0 means hw_timestamp is not
available.

Inside the kernel, the ctx rewriter can generate code
to call driver specific function to retrieve the data.
If the driver generates the code to retrieve the data, how's that
different from the kfunc approach?
The only difference I see is that it would be a more strong UAPI than
the kfuncs?
Right. it is a strong uapi.

The kfunc approach can be used to*less*  common use cases?
What's the advantage of having two approaches when one can cover
common and uncommon cases?

Beyond hw_timestamp, do we have any other fields ready to support?

If it ends up with lots of fields to be accessed by the bpf program,
and bpf program actually intends to access these fields,
using a strong uapi might be a good thing as it can make code
much streamlined.
> There are a bunch. Alexander's series has a good list:

https://github.com/alobakin/linux/commit/31bfe8035c995fdf4f1e378b3429d24b96846cc8


Below are the fields I've identified, which are close to what Alexander also found.

 struct xdp_hints_common {
	union {
		__wsum		csum;
		struct {
			__u16	csum_start;
			__u16	csum_offset;
		};
	};
	u16 rx_queue;
	u16 vlan_tci;
	u32 rx_hash32;
	u32 xdp_hints_flags;
	u64 btf_full_id; /* BTF object + type ID */
 } __attribute__((aligned(4))) __attribute__((packed));

Some of the fields are encoded via flags:

 enum xdp_hints_flags {
	HINT_FLAG_CSUM_TYPE_BIT0  = BIT(0),
	HINT_FLAG_CSUM_TYPE_BIT1  = BIT(1),
	HINT_FLAG_CSUM_TYPE_MASK  = 0x3,

	HINT_FLAG_CSUM_LEVEL_BIT0 = BIT(2),
	HINT_FLAG_CSUM_LEVEL_BIT1 = BIT(3),
	HINT_FLAG_CSUM_LEVEL_MASK = 0xC,
	HINT_FLAG_CSUM_LEVEL_SHIFT = 2,

	HINT_FLAG_RX_HASH_TYPE_BIT0 = BIT(4),
	HINT_FLAG_RX_HASH_TYPE_BIT1 = BIT(5),
	HINT_FLAG_RX_HASH_TYPE_MASK = 0x30,
	HINT_FLAG_RX_HASH_TYPE_SHIFT = 0x4,

	HINT_FLAG_RX_QUEUE = BIT(7),

	HINT_FLAG_VLAN_PRESENT            = BIT(8),
	HINT_FLAG_VLAN_PROTO_ETH_P_8021Q  = BIT(9),
	HINT_FLAG_VLAN_PROTO_ETH_P_8021AD = BIT(10),
	/* Flags from BIT(16) can be used by drivers */
 };

We can definitely call some of them more "common" than the others, but
not sure how strong of a definition that would be.

The important fields that would be worth considering as UAPI candidates
are: (1) RX-hash, (2) Hash-type and (3) RX-checksum.
With these three we can avoid calling the flow-dissector and GRO frame
aggregations works. (This currently hurts xdp_frame to SKB performance a
lot in practice).

*BUT* in it's current form above (incl. Alexanders approach/patch) it
would be a mistake to UAPI standardize the "(2) Hash-type" in this
simplified "reduced" form (which is what the SKB "needs").

There is a huge untapped potential in the Hash-type.  Thanks to
Microsoft almost all NIC hardware provided a Hash-type that gives us the
L3-protocol (IPv4 or IPv6) and the L4-protocol (UDP or TCP and sometimes
SCTP), plus info if extention-headers are provided. (Digging in
datasheets, we can often also get the header-size).

Think about how many cycles XDP BPF-prog can save parsing protocol
headers.  I'm also hoping we can leveregate this to allow SKBs created
from an xdp_frame to have skb->transport_header and skb->network_header
pre-populated (and skip some of these netstack layers).

--Jesper

p.s. in my patchset, I exposed the "raw" Hash-type bits from the descriptor in hope this would evolve.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux