Re: [PATCH bpf-next 1/2] bpf: return correct -ENOBUFS from bpf_clone_redirect

Stanislav Fomichev <sdf@xxxxxxxxxx> · Mon, 11 Sep 2023 10:11:02 -0700

On 09/09, Martin KaFai Lau wrote:
> On 9/8/23 2:00 PM, Stanislav Fomichev wrote:
> > Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped
> > packets") exposed the fact that bpf_clone_redirect is capable of
> > returning raw NET_XMIT_XXX return codes.
> > 
> > This is in the conflict with its UAPI doc which says the following:
> > "0 on success, or a negative error in case of failure."
> > 
> > Let's wrap dev_queue_xmit's return value (in __bpf_tx_skb) into
> > net_xmit_errno to make sure we correctly propagate NET_XMIT_DROP
> > as -ENOBUFS instead of 1.
> > 
> > Note, this is technically breaking existing UAPI where we used to
> > return 1 and now will do -ENOBUFS. The alternative is to
> > document that bpf_clone_redirect can return 1 for DROP and 2 for CN.
> > 
> > Reported-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> > ---
> >   net/core/filter.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index a094694899c9..9e297931b02f 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2129,6 +2129,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
> >   	ret = dev_queue_xmit(skb);
> >   	dev_xmit_recursion_dec();
> > +	if (ret > 0)
> > +		ret = net_xmit_errno(ret);
> 
> I think it is better to have bpf_clone_redirect returning -ENOBUFS instead
> of leaking NET_XMIT_XXX to the uapi. The bpf_clone_redirect in the
> uapi/bpf.h also mentions
> 
>  *      Return
>  *              0 on success, or a negative error in case of failure.
> 
> If -ENOBUFS is returned in __bpf_tx_skb, should the same be done for
> __bpf_rx_skb? and should net_xmit_errno() only be done for
> bpf_clone_redirect()?  __bpf_{tx,rx}_skb is also used by skb_do_redirect()
> which also calls __bpf_redirect_neigh() that returns NET_XMIT_xxx but no
> caller seems to care the NET_XMIT_xxx value now.

__bpf_rx_skb seems to only add to backlog and doesn't seem to return any
of the NET_XMIT_xxx. But I might be wrong and haven't looked too deep
into that.

> Daniel should know more here. I would wait for Daniel to comment.

Ack, sure!

> ~~~~
> 
> For the selftest, may be another option is to use a 28 bytes data_in for the
> lwt program redirecting to veth? 14 bytes used by bpf_prog_test_run_skb and
> leave 14 bytes for veth_xmit. It seems the original intention of the "veth
> ETH_HLEN+1 packet ingress" test is expecting it to succeed also.

IIUC, you're suggesting to pass full ipv4 or ipv6 packet for veth tests
to make them actually succeed with the forwarding, right?

Sure, I can do that. But let's keep this entry with the -NOBUFS as well?
Just for the sake of ensuring that we don't export NET_XMIT_xxx from
uapi.