On Tue, Jan 4, 2022 at 9:15 AM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote: > > The helpers continue to use int for retval because all the hooks > are int-returning rather than long-returning. The return value of > bpf_set_retval is int for future-proofing, in case in the future > there may be errors trying to set the retval. > > After the previous patch, if a program rejects a syscall by > returning 0, an -EPERM will be generated no matter if the retval > is already set to -err. This patch change it being forced only if > retval is not -err. This is because we want to support, for > example, invoking bpf_set_retval(-EINVAL) and return 0, and have > the syscall return value be -EINVAL not -EPERM. > > This change is reflected in the sockopt_sk test which has been > updated to assert the errno is EINVAL instead of the EPERM. > The eBPF prog has to explicitly bpf_set_retval(-EPERM) if EPERM > is wanted. I also removed the explicit mentions of EPERM in the > comments in the prog. > > For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is > that, if the return value is NET_XMIT_DROP, the packet is silently > dropped. We preserve this behavior for backward compatibility > reasons, so even if an errno is set, the errno does not return to > caller. However, setting a non-err to retval cannot propagate so > this is not allowed and we return a -EFAULT in that case. > > Signed-off-by: YiFei Zhu <zhuyifei@xxxxxxxxxx> > Reviewed-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > --- > include/linux/bpf.h | 10 +++-- > include/uapi/linux/bpf.h | 18 +++++++++ > kernel/bpf/cgroup.c | 38 ++++++++++++++++++- > tools/include/uapi/linux/bpf.h | 18 +++++++++ > .../selftests/bpf/prog_tests/sockopt_sk.c | 2 +- > .../testing/selftests/bpf/progs/sockopt_sk.c | 32 ++++++++-------- > 6 files changed, 96 insertions(+), 22 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 88f6891e2b53..300df48fa0e0 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1300,7 +1300,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, > while ((prog = READ_ONCE(item->prog))) { > run_ctx.prog_item = item; > func_ret = run_prog(prog, ctx); > - if (!(func_ret & 1)) > + if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval)) > run_ctx.retval = -EPERM; > *(ret_flags) |= (func_ret >> 1); > item++; > @@ -1330,7 +1330,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > while ((prog = READ_ONCE(item->prog))) { > run_ctx.prog_item = item; > - if (!run_prog(prog, ctx)) > + if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval)) > run_ctx.retval = -EPERM; > item++; > } > @@ -1390,7 +1390,7 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu, > * 0: NET_XMIT_SUCCESS skb should be transmitted > * 1: NET_XMIT_DROP skb should be dropped and cn > * 2: NET_XMIT_CN skb should be transmitted and cn > - * 3: -EPERM skb should be dropped > + * 3: -err skb should be dropped > */ > #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ > ({ \ > @@ -1399,10 +1399,12 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu, > u32 _ret; \ > _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ > _cn = _flags & BPF_RET_SET_CN; \ > + if (_ret && !IS_ERR_VALUE((long)_ret)) \ > + _ret = -EFAULT; \ > if (!_ret) \ > _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \ > else \ > - _ret = (_cn ? NET_XMIT_DROP : -EPERM); \ > + _ret = (_cn ? NET_XMIT_DROP : _ret); \ Sorry for the long delay in reviewing. Overall it looks very good. Few questions: Why change this behavior for BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY ? It's for an inet_egress hook only. In other words ip_output. What kind of different error codes do you want to pass to the stack from there? > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c > index 4b937e5dbaca..164aa5020bf1 100644 > --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c > @@ -177,7 +177,7 @@ static int getsetsockopt(void) > optlen = sizeof(buf.zc); > errno = 0; > err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen); > - if (errno != EPERM) { > + if (errno != EINVAL) { Could you explain which part of this patch caused this change in user visible behavior? I understand the desire to do bpf_set_retval(-EINVAL) and return 0, but progs/sockopt_sk.c is not doing it. Where does EINVAL come from?