Re: [PATCH bpf-next 2/3] bpf: Add cgroup helper bpf_export_errno to get/set exported errno value

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 26, 2021 at 2:26 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
>
> On Tue, Oct 26, 2021 at 1:50 PM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote:
> >
> > On Tue, Oct 26, 2021 at 8:44 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 25, 2021 at 5:06 PM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Oct 20, 2021 at 4:28 PM Andrii Nakryiko
> > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > >
> > > > > it's subjective, but "bpf_export_errno" name is quite confusing. What
> > > > > are we "exporting" and where?
> > > > >
> > > > > I actually like Song's proposal for two helpers,
> > > > > bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
> > > > > honestly don't remember the requirement to have one combined helper
> > > > > from the BPF office hour discussion, but if there was a good reason
> > > > > for that, please remind us.
> > > > >
> > > > > > + *     Description
> > > > > > + *             If *errno_val* is positive, set the syscall's return error code;
> > > > >
> > > > > This inversion of error code is also confusing. If we are to return
> > > > > -EXXX, bpf_set_err(EXXX) is quite confusing.
> > > > >
> > > > > > + *             if *errno_val* is zero, retrieve the previously set code.
> > > > >
> > > > > Also, are there use cases where zero is the valid "error" (or lack of
> > > > > it, rather). I.e., wouldn't there be cases where you want to clear a
> > > > > previous error? We might have discussed this, sorry if I forgot.
> > > >
> > > > Hmm, originally I thought it's best to assume the underlying
> > > > assumption is that filters may set policies and it would violate it if
> > > > policies become ignored; however one could argue that debugging would
> > > > be a use case for an error-clearing filter.
> > > >
> > > > Let's say we do bpf_set_err()/bpf_get_err(), with the ability to clear
> > > > errors. I'm having trouble thinking of the best way to have it
> > > > interact with the getsockopt "retval" in its context:
> > > > * Let's say the kernel initially sets an error code in the retval. I
> > > > think it would be a surprising behavior if only "retval" but not
> > > > bpf_get_err() shows the error. Therefore we'd need to initialize "err"
> > > > with the "retval" if retval is an error.
> > > > * If we initialize "err" with the "retval", then for a prog to clear
> > > > the error they'd need to clear it twice, once with bpf_set_err(0) with
> > > > and another with ctx->retval = 0. This will immediately break backward
> > > > compatibility. Therefore, we'd need to mirror the setting of
> > > > ctx->retval = 0 to bpf_set_err(0)
> > > > * In that case, what to do if a user uses ctx->retval as a way to pass
> > > > data between filters? I mean, whether ctx->retval is set to 0 or the
> > > > original is only checked after all filters are run. It could be any
> > > > value while the filters are running.
> > > > * A second issue, if we have first a legacy filter that returns 0 to
> > > > set EPERM, and then there's another filter that does a ctx->retval =
> > > > 0. The original behavior would be that the syscall fails with EPERM,
> > > > but if we mirror ctx->retval = 0 to bpf_set_err(0), then that EPERM
> > > > would be cleared.
> > > >
> > > > One of the reasons I liked "export" is that it's slightly clearer that
> > > > this value is strictly from the BPF's side and has nothing to do with
> > > > what the kernel sets (as in the getsockopt case). But yeah I agree
> > > > it's not an ideal name.
> > >
> > > For getsockopt, maybe the best way to go is to point ctx->retval to
> > > run_ctx.errno_val? (i.e., bpf_set_err would be equivalent to doing
> > > ctx->retval = x;). We can leave ctx->retval as a backwards-compatible
> > > legacy way of doing things. For new programs, bpf_set_err would work
> > > universally, regardless of attach type. Any cons here?
> >
> > Is it a concern that AFAICT getsockopt retval may be a positive number
> > whereas the err here must be non-negative?
>
> getsockopt retval is either -errno or 0. It's not really enforced at
> load/attach time, but there is a runtime check which returns -EFAULT
> if the prog sets it to something else.
>
> > Also the fourth point still stands. If any getsockopt returns 0,
> > original behavior is return -EPERM whereas new behavior, clearing
> > retval will clear -EPERM.
>
> True, but do you think these cases exist out there? I guess somebody
> can do it inadvertently, but the example you've mentioned doesn't
> really make sense, right?
> This is why we are adding a way to propagate the status, so the
> programs in the chain can understand whether they should do anything
> at all (previous prog returned EPERM). Returning EPERM from the child
> and then doing ctx->retval=0 in the parent should already not work as
> expected.

How about this? Have a bpf_{get,set}_retval that mirrors (in both
directions) the ctx->retval without any processing. Considering
in-kernel implementations of getsockopt sometimes return positive
values (usually optlen), we could allow eBPF-implemented getsockopt to
do so too, by relaxing the current 'only change to zero or keep the
same restriction, and allow it the filter to set arbitrary return
values to user space. For a filter that runs before the in-kernel
implementation, such as setsockopt or cgroup_skb, we verify after
running all the hooks, that it must be 0 or a negative number in
-errno; -EFAULT otherwise.

For legacy -EPERM programs that do it by returning 0, a filter that
bpf_set_retval(0) or ctx->retval = 0 will clear the -EPERM, this will
be different from the current behavior of getsockopt programs. I'm not
really sure of any use cases where users would rely on the current
behavior -- one would do ctx->retval = 0 to tell userspace that
something is done, yet another filter denies that 'something'? Doesn't
make sense to me, but correct me if I'm wrong or if we think this UAPI
must be kept exactly the same.

Another potential UAPI breakage is that originally getsockopt hooks
can inject an -EFAULT instead of -EPERM by setting bogus values to
ctx->retval. Now they have to do it by setting ctx->retval = -EFAULT;
any other value, even bogus values will be passed to userspace. That
said, I'm not sure why anyone would want to return an -EFAULT instead
of -EPERM; some unusual fault injection maybe? And in that case if
they literally want -EFAULT, the statement that makes sense would
already be ctx->retval = -EFAULT, which is usually a bogus value.

Considering that here we would have bpf_{get,set}_retval with no
in-kernel processing at all to mirror a value in ctx... I think it
would make a lot of sense to just use a context variable instead of
helpers (i.e. provide ctx->retval to all cgroup program types)?

YiFei Zhu



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux