On Mon, Sep 16, 2024 at 5:48 PM Nikolay Aleksandrov <razor@xxxxxxxxxxxxx> wrote: > > On 16/09/2024 08:50, Jiwon Kim wrote: > > Add bond_xdp_check to ensure the bond interface is in a valid state. > > > > syzbot reported WARNING in bond_xdp_get_xmit_slave. > > In bond_xdp_get_xmit_slave, the comment says > > /* Should never happen. Mode guarded by bond_xdp_check() */. > > However, it does not check the status when entering bond_xdp_xmit. > > > > Reported-by: syzbot+c187823a52ed505b2257@xxxxxxxxxxxxxxxxxxxxxxxxx > > Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257 > > Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") > > Signed-off-by: Jiwon Kim <jiwonaid0@xxxxxxxxx> > > --- > > drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++--------------- > > 1 file changed, 18 insertions(+), 15 deletions(-) > > > > How did you figure the problem is there? Did you take any time to actually > understand it? This patch doesn't fix anything, the warning can be easily > triggered with it. The actual fix is to remove that WARN_ON() altogether > and downgrade the netdev_err() to a ratelimited version. The reason is that > we can always get to a state where at least 1 bond device has xdp program > installed which increases bpf_master_redirect_enabled_key and another bond > device which uses xdpgeneric, then install an ebpf program that simply > returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning. > > setup is[1]: > $ ip l add veth0 type veth peer veth1 > $ ip l add veth3 type veth peer veth4 > $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp > $ ip l add bond1 type bond # <- rr mode by default, supported by xdp > $ ip l set veth0 master bond1 > $ ip l set bond1 up > $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below > $ ip l set veth3 master bond0 > $ ip l set bond0 up > $ ip l set veth4 up > $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet > > > If you take the time to look at the call stack and the actual code, you'll > see it goes something like (for the xdpgeneric bond slave, veth3): > ... > bpf_prog_run_generic_xdp() for veth3 > -> bpf_prog_run_xdp() > -> __bpf_prog_run() # return ACT_TX > -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev) > -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON() > > I've had a patch for awhile now about this and have taken the time to look into it. > I guess it's time to dust it off and send it out for review. :) > > Thanks, > Nik Hi Nikolay, Thank you for taking the time to provide a detailed setup and call stack analysis. Would you be handling the new patch? If you don't mind, may I revise this patch to - Replace with net_ratelimit() - Remove the WARN_ON() - Update the comment appropriately Thanks again for your insights and patience. Sincerely, Jiwon Kim