Re: [PATCH bpf-next 2/3] bpf: Add cgroup helper bpf_export_errno to get/set exported errno value

Stanislav Fomichev <sdf@xxxxxxxxxx> · Tue, 26 Oct 2021 14:26:42 -0700

On Tue, Oct 26, 2021 at 1:50 PM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote:
>
> On Tue, Oct 26, 2021 at 8:44 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
> >
> > On Mon, Oct 25, 2021 at 5:06 PM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 4:28 PM Andrii Nakryiko
> > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > >
> > > > it's subjective, but "bpf_export_errno" name is quite confusing. What
> > > > are we "exporting" and where?
> > > >
> > > > I actually like Song's proposal for two helpers,
> > > > bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
> > > > honestly don't remember the requirement to have one combined helper
> > > > from the BPF office hour discussion, but if there was a good reason
> > > > for that, please remind us.
> > > >
> > > > > + *     Description
> > > > > + *             If *errno_val* is positive, set the syscall's return error code;
> > > >
> > > > This inversion of error code is also confusing. If we are to return
> > > > -EXXX, bpf_set_err(EXXX) is quite confusing.
> > > >
> > > > > + *             if *errno_val* is zero, retrieve the previously set code.
> > > >
> > > > Also, are there use cases where zero is the valid "error" (or lack of
> > > > it, rather). I.e., wouldn't there be cases where you want to clear a
> > > > previous error? We might have discussed this, sorry if I forgot.
> > >
> > > Hmm, originally I thought it's best to assume the underlying
> > > assumption is that filters may set policies and it would violate it if
> > > policies become ignored; however one could argue that debugging would
> > > be a use case for an error-clearing filter.
> > >
> > > Let's say we do bpf_set_err()/bpf_get_err(), with the ability to clear
> > > errors. I'm having trouble thinking of the best way to have it
> > > interact with the getsockopt "retval" in its context:
> > > * Let's say the kernel initially sets an error code in the retval. I
> > > think it would be a surprising behavior if only "retval" but not
> > > bpf_get_err() shows the error. Therefore we'd need to initialize "err"
> > > with the "retval" if retval is an error.
> > > * If we initialize "err" with the "retval", then for a prog to clear
> > > the error they'd need to clear it twice, once with bpf_set_err(0) with
> > > and another with ctx->retval = 0. This will immediately break backward
> > > compatibility. Therefore, we'd need to mirror the setting of
> > > ctx->retval = 0 to bpf_set_err(0)
> > > * In that case, what to do if a user uses ctx->retval as a way to pass
> > > data between filters? I mean, whether ctx->retval is set to 0 or the
> > > original is only checked after all filters are run. It could be any
> > > value while the filters are running.
> > > * A second issue, if we have first a legacy filter that returns 0 to
> > > set EPERM, and then there's another filter that does a ctx->retval =
> > > 0. The original behavior would be that the syscall fails with EPERM,
> > > but if we mirror ctx->retval = 0 to bpf_set_err(0), then that EPERM
> > > would be cleared.
> > >
> > > One of the reasons I liked "export" is that it's slightly clearer that
> > > this value is strictly from the BPF's side and has nothing to do with
> > > what the kernel sets (as in the getsockopt case). But yeah I agree
> > > it's not an ideal name.
> >
> > For getsockopt, maybe the best way to go is to point ctx->retval to
> > run_ctx.errno_val? (i.e., bpf_set_err would be equivalent to doing
> > ctx->retval = x;). We can leave ctx->retval as a backwards-compatible
> > legacy way of doing things. For new programs, bpf_set_err would work
> > universally, regardless of attach type. Any cons here?
>
> Is it a concern that AFAICT getsockopt retval may be a positive number
> whereas the err here must be non-negative?

getsockopt retval is either -errno or 0. It's not really enforced at
load/attach time, but there is a runtime check which returns -EFAULT
if the prog sets it to something else.

> Also the fourth point still stands. If any getsockopt returns 0,
> original behavior is return -EPERM whereas new behavior, clearing
> retval will clear -EPERM.

True, but do you think these cases exist out there? I guess somebody
can do it inadvertently, but the example you've mentioned doesn't
really make sense, right?
This is why we are adding a way to propagate the status, so the
programs in the chain can understand whether they should do anything
at all (previous prog returned EPERM). Returning EPERM from the child
and then doing ctx->retval=0 in the parent should already not work as
expected.