Re: [PATCH v3 bpf-next 3/4] bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 19 Jan 2022 13:28:59 -0800

On Wed, Jan 19, 2022 at 12:22 PM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote:
>
> On Wed, Jan 19, 2022 at 11:50 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Tue, Jan 4, 2022 at 9:15 AM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote:
> > >
> > > The helpers continue to use int for retval because all the hooks
> > > are int-returning rather than long-returning. The return value of
> > > bpf_set_retval is int for future-proofing, in case in the future
> > > there may be errors trying to set the retval.
> > >
> > > After the previous patch, if a program rejects a syscall by
> > > returning 0, an -EPERM will be generated no matter if the retval
> > > is already set to -err. This patch change it being forced only if
> > > retval is not -err. This is because we want to support, for
> > > example, invoking bpf_set_retval(-EINVAL) and return 0, and have
> > > the syscall return value be -EINVAL not -EPERM.
> > >
> > > This change is reflected in the sockopt_sk test which has been
> > > updated to assert the errno is EINVAL instead of the EPERM.
> > > The eBPF prog has to explicitly bpf_set_retval(-EPERM) if EPERM
> > > is wanted. I also removed the explicit mentions of EPERM in the
> > > comments in the prog.
> > >
> > > For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> > > that, if the return value is NET_XMIT_DROP, the packet is silently
> > > dropped. We preserve this behavior for backward compatibility
> > > reasons, so even if an errno is set, the errno does not return to
> > > caller. However, setting a non-err to retval cannot propagate so
> > > this is not allowed and we return a -EFAULT in that case.
> > >
> > > Signed-off-by: YiFei Zhu <zhuyifei@xxxxxxxxxx>
> > > Reviewed-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> > > ---
> > >  include/linux/bpf.h                           | 10 +++--
> > >  include/uapi/linux/bpf.h                      | 18 +++++++++
> > >  kernel/bpf/cgroup.c                           | 38 ++++++++++++++++++-
> > >  tools/include/uapi/linux/bpf.h                | 18 +++++++++
> > >  .../selftests/bpf/prog_tests/sockopt_sk.c     |  2 +-
> > >  .../testing/selftests/bpf/progs/sockopt_sk.c  | 32 ++++++++--------
> > >  6 files changed, 96 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 88f6891e2b53..300df48fa0e0 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1300,7 +1300,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
> > >         while ((prog = READ_ONCE(item->prog))) {
> > >                 run_ctx.prog_item = item;
> > >                 func_ret = run_prog(prog, ctx);
> > > -               if (!(func_ret & 1))
> > > +               if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval))
> > >                         run_ctx.retval = -EPERM;
> > >                 *(ret_flags) |= (func_ret >> 1);
> > >                 item++;
> > > @@ -1330,7 +1330,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
> > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > >         while ((prog = READ_ONCE(item->prog))) {
> > >                 run_ctx.prog_item = item;
> > > -               if (!run_prog(prog, ctx))
> > > +               if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval))
> > >                         run_ctx.retval = -EPERM;
> > >                 item++;
> > >         }
> > > @@ -1390,7 +1390,7 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> > >   *   0: NET_XMIT_SUCCESS  skb should be transmitted
> > >   *   1: NET_XMIT_DROP     skb should be dropped and cn
> > >   *   2: NET_XMIT_CN       skb should be transmitted and cn
> > > - *   3: -EPERM            skb should be dropped
> > > + *   3: -err              skb should be dropped
> > >   */
> > >  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)                \
> > >         ({                                              \
> > > @@ -1399,10 +1399,12 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> > >                 u32 _ret;                               \
> > >                 _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
> > >                 _cn = _flags & BPF_RET_SET_CN;          \
> > > +               if (_ret && !IS_ERR_VALUE((long)_ret))  \
> > > +                       _ret = -EFAULT;                 \
> > >                 if (!_ret)                              \
> > >                         _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);  \
> > >                 else                                    \
> > > -                       _ret = (_cn ? NET_XMIT_DROP : -EPERM);          \
> > > +                       _ret = (_cn ? NET_XMIT_DROP : _ret);            \
> >
> > Sorry for the long delay in reviewing.
> > Overall it looks very good.
> > Few questions:
> >
> > Why change this behavior for BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY ?
> > It's for an inet_egress hook only. In other words ip_output.
> > What kind of different error codes do you want to pass to
> > the stack from there?
>
> I don't really have a use case in mind for a different error code for
> an egress hook (my use cases are for sockopt hooks) at the moment, but
> it sounds to me that it would a surprising behavior if bpf_set_retval
> is provided yet it would still be -EPERM.

Good point.

> > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > > index 4b937e5dbaca..164aa5020bf1 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > > @@ -177,7 +177,7 @@ static int getsetsockopt(void)
> > >         optlen = sizeof(buf.zc);
> > >         errno = 0;
> > >         err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen);
> > > -       if (errno != EPERM) {
> > > +       if (errno != EINVAL) {
> >
> > Could you explain which part of this patch caused this change
> > in user visible behavior?
> > I understand the desire to do bpf_set_retval(-EINVAL) and return 0,
> > but progs/sockopt_sk.c is not doing it.
> > Where does EINVAL come from?
>
> This comes from the kernel. In for an anvalid address to the
> getsockopt handler in tcp_zerocopy_receive [1]. The original behavior
> prior to this series is that the eBPF getsockopt hook generating an
> -EPERM overrides that of the kernel's -EINVAL, but now when the eBPF
> hook returns 0, it sees that an -EINVAL is already set by the kernel
> and does not modify the error code.

Got it.
I've added the following to the last patch to clarify it:
-       buf.zc.address = 12345; /* rejected by BPF */
+       buf.zc.address = 12345; /* Not page aligned. Rejected by
tcp_zerocopy_receive() */

and applied the set to bpf-next. Thanks!