On 17/09/2024 07:26, 김지원 wrote: > On Mon, Sep 16, 2024 at 5:48 PM Nikolay Aleksandrov <razor@xxxxxxxxxxxxx> wrote: >> >> On 16/09/2024 08:50, Jiwon Kim wrote: >>> Add bond_xdp_check to ensure the bond interface is in a valid state. >>> >>> syzbot reported WARNING in bond_xdp_get_xmit_slave. >>> In bond_xdp_get_xmit_slave, the comment says >>> /* Should never happen. Mode guarded by bond_xdp_check() */. >>> However, it does not check the status when entering bond_xdp_xmit. >>> >>> Reported-by: syzbot+c187823a52ed505b2257@xxxxxxxxxxxxxxxxxxxxxxxxx >>> Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257 >>> Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") >>> Signed-off-by: Jiwon Kim <jiwonaid0@xxxxxxxxx> >>> --- >>> drivers/net/bonding/bond_main.c | 33 ++++++++++++++++++--------------- >>> 1 file changed, 18 insertions(+), 15 deletions(-) >>> >> >> How did you figure the problem is there? Did you take any time to actually >> understand it? This patch doesn't fix anything, the warning can be easily >> triggered with it. The actual fix is to remove that WARN_ON() altogether >> and downgrade the netdev_err() to a ratelimited version. The reason is that >> we can always get to a state where at least 1 bond device has xdp program >> installed which increases bpf_master_redirect_enabled_key and another bond >> device which uses xdpgeneric, then install an ebpf program that simply >> returns ACT_TX on xdpgeneric bond's slave and voila - you get the warning. >> >> setup is[1]: >> $ ip l add veth0 type veth peer veth1 >> $ ip l add veth3 type veth peer veth4 >> $ ip l add bond0 type bond mode 6 # <- transmit-alb mode, unsupported by xdp >> $ ip l add bond1 type bond # <- rr mode by default, supported by xdp >> $ ip l set veth0 master bond1 >> $ ip l set bond1 up >> $ ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx # <- we need xdpdrv program to increase the static key, more below >> $ ip l set veth3 master bond0 >> $ ip l set bond0 up >> $ ip l set veth4 up >> $ ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx # <- now we'll hit the codepath we need after veth3 Rx's a packet >> >> >> If you take the time to look at the call stack and the actual code, you'll >> see it goes something like (for the xdpgeneric bond slave, veth3): >> ... >> bpf_prog_run_generic_xdp() for veth3 >> -> bpf_prog_run_xdp() >> -> __bpf_prog_run() # return ACT_TX >> -> xdp_master_redirect() # called because we have ACT_TX && netif_is_bond_slave(xdp->rxq->dev) >> -> master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); # and here we go, WARN_ON() >> >> I've had a patch for awhile now about this and have taken the time to look into it. >> I guess it's time to dust it off and send it out for review. :) >> >> Thanks, >> Nik > > Hi Nikolay, > > Thank you for taking the time to provide a detailed setup and call > stack analysis. > Would you be handling the new patch? If you don't mind, may I revise > this patch to > > - Replace with net_ratelimit() > - Remove the WARN_ON() > - Update the comment appropriately > > Thanks again for your insights and patience. > > Sincerely, > > Jiwon Kim sure, I don't mind, change the patch and I'll review the new one Cheers, Nik