Hey there, while taking a closer look at how the RX metadata kfuncs are implemented in the mlx5 and ice drivers, I suspected a bug and, after testing, could in fact produce a NULL pointer dereference. The mlx5 driver implements the RX metadata kfuncs like, for example, bpf_xdp_metadata_rx_vlan_tag by casting the xdp_md pointer from the function argument to an mlx5e_xdp_buff pointer. This is needed to get access to the packet metadata. See mlx5e_xdp_rx_vlan_tag for example. The ice driver works similarly. This is fine, because normally these drivers always create a full mlx5e_xdp_buff struct when allocating the xdp_buff struct. But when a device-bound XDP program is attached to the mlx5 netdevice in generic mode, the xdp_buff is not allocated by the mlx5 driver but as a part of the do_xdp_generic implementation. Now, when a packet comes in and the XDP program tries to call one of these kfuncs, the kfunc implementation will try to dereference pointers inside the mlx5e_xdp_buff struct which is not fully allocated, leading to a NULL pointer dereference. There is probably a check missing somewhere that prevents the use of these kfuncs in the scope of do_xdp_generic? Or may there be another way to implement the RX metadata kfuncs in the driver that does not involve casting the xdp_buff pointer? Here is how this can be reproduced: eBPF program: #include <bpf.h> extern int bpf_xdp_metadata_rx_vlan_tag( const struct xdp_md *ctx, __be16 *vlan_proto, __u16 *vlan_tci) __ksym; SEC("xdp") int ingress(struct xdp_md *ctx) { __be16 vlan_proto; __u16 vlan_tci; if (bpf_xdp_metadata_rx_vlan_tag(ctx, &vlan_proto, &vlan_tci) != 0) { return XDP_ABORTED; } return XDP_DROP; } char _license[] SEC("license") = "GPL"; Load and attach it as a device-bound program to a mlx5 NIC in XDP-generic mode: # bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1 # bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1 Then make sure a packet is coming in on that NIC port so the XDP program gets called: # ping -I mlx5-conx5-2 1.1.1.1 In my testing environment, mlx5-conx5-2 and mlx5-conx5-1 are directly connected. Kernel output: Unable to handle kernel NULL pointer dereference at virtual address 000000000000001d Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000008035e557000 [000000000000001d] pgd=0000000000000000, p4d=0000000000000000 This was reproduced with Linux 6.12 mainline (adc2186). -- Best regards, Marcus Wichelmann Linux Networking Specialist Team SDN ______________________________ Hetzner Cloud GmbH Feringastraße 12A 85774 Unterföhring Germany Phone: +49 89 381690 150 E-Mail: marcus.wichelmann@xxxxxxxxxxxxxxxx Handelsregister München HRB 226782 Geschäftsführer: Sebastian Färber, Markus Kalmuk ------------------ For information on the processing of your personal data in the context of this contact, please see https://hetzner-cloud.de/datenschutz ------------------