Re: [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Tue, 13 Feb 2024 21:51:30 +0100

On Tue, 13 Feb 2024 at 20:33, David Vernet <void@xxxxxxxxxxxxx> wrote:
>
> On Mon, Feb 12, 2024 at 11:43:42PM +0100, Kumar Kartikeya Dwivedi wrote:
> > On Mon, 12 Feb 2024 at 21:53, David Vernet <void@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Feb 01, 2024 at 04:21:09AM +0000, Kumar Kartikeya Dwivedi wrote:
> > > > Add tests for the runtime cleanup support for exceptions, ensuring that
> > > > resources are correctly identified and released when an exception is
> > > > thrown. Also, we add negative tests to exercise corner cases the
> > > > verifier should reject.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> > > > ---
> > > >  tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
> > > >  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
> > > >  .../bpf/prog_tests/exceptions_cleanup.c       |  65 +++
> > > >  .../selftests/bpf/progs/exceptions_cleanup.c  | 468 ++++++++++++++++++
> > > >  .../bpf/progs/exceptions_cleanup_fail.c       | 154 ++++++
> > > >  .../selftests/bpf/progs/exceptions_fail.c     |  13 -
> > > >  6 files changed, 689 insertions(+), 13 deletions(-)
> > > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup.c
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > > index 5c2cc7e8c5d0..6fc79727cd14 100644
> > > > --- a/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > > +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > > @@ -1,6 +1,7 @@
> > > >  bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> > > >  bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> > > >  exceptions                                    # JIT does not support calling kfunc bpf_throw: -524
> > > > +exceptions_unwind                             # JIT does not support calling kfunc bpf_throw: -524
> > > >  fexit_sleep                                      # The test never returns. The remaining tests cannot start.
> > > >  kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
> > > >  kprobe_multi_test                                # needs CONFIG_FPROBE
> > > > diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> > > > index 1a63996c0304..f09a73dee72c 100644
> > > > --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> > > > +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> > > > @@ -1,5 +1,6 @@
> > > >  # TEMPORARY
> > > >  # Alphabetical order
> > > >  exceptions                            # JIT does not support calling kfunc bpf_throw                                (exceptions)
> > > > +exceptions_unwind                     # JIT does not support calling kfunc bpf_throw                                (exceptions)
> > > >  get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
> > > >  stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > > new file mode 100644
> > > > index 000000000000..78df037b60ea
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > > @@ -0,0 +1,65 @@
> > > > +#include "bpf/bpf.h"
> > > > +#include "exceptions.skel.h"
> > > > +#include <test_progs.h>
> > > > +#include <network_helpers.h>
> > > > +
> > > > +#include "exceptions_cleanup.skel.h"
> > > > +#include "exceptions_cleanup_fail.skel.h"
> > > > +
> > > > +static void test_exceptions_cleanup_fail(void)
> > > > +{
> > > > +     RUN_TESTS(exceptions_cleanup_fail);
> > > > +}
> > > > +
> > > > +void test_exceptions_cleanup(void)
> > > > +{
> > > > +     LIBBPF_OPTS(bpf_test_run_opts, ropts,
> > > > +             .data_in = &pkt_v4,
> > > > +             .data_size_in = sizeof(pkt_v4),
> > > > +             .repeat = 1,
> > > > +     );
> > > > +     struct exceptions_cleanup *skel;
> > > > +     int ret;
> > > > +
> > > > +     if (test__start_subtest("exceptions_cleanup_fail"))
> > > > +             test_exceptions_cleanup_fail();
> > >
> > > RUN_TESTS takes care of doing test__start_subtest(), etc. You should be
> > > able to just call RUN_TESTS(exceptions_cleanup_fail) directly here.
> > >
> >
> > Ack, will fix.
> >
> > > > +
> > > > +     skel = exceptions_cleanup__open_and_load();
> > > > +     if (!ASSERT_OK_PTR(skel, "exceptions_cleanup__open_and_load"))
> > > > +             return;
> > > > +
> > > > +     ret = exceptions_cleanup__attach(skel);
> > > > +     if (!ASSERT_OK(ret, "exceptions_cleanup__attach"))
> > > > +             return;
> > > > +
> > > > +#define RUN_EXC_CLEANUP_TEST(name)                                      \
> > >
> > > Should we add a call to if (test__start_subtest(#name)) to this macro?
> > >
> >
> > Makes sense, will change this.
> >
> > > > [...]
> > > > +
> > > > +SEC("tc")
> > > > +int exceptions_cleanup_percpu_obj(struct __sk_buff *ctx)
> > > > +{
> > > > +    struct { int i; } *p;
> > > > +
> > > > +    p = bpf_percpu_obj_new(typeof(*p));
> > > > +    MARK_RESOURCE(&p, RES_SPILL);
> > > > +    bpf_throw(VAL);
> > >
> > > It would be neat if we could have the bpf_throw() kfunc signature be
> > > marked as __attribute__((noreturn)) and have things work correctly;
> > > meaning you wouldn't have to even return a value here. The verifier
> > > should know that bpf_throw() is terminal, so it should be able to prune
> > > any subsequent instructions as unreachable anyways.
> > >
> >
> > Originally, I was tagging the kfunc as noreturn, but Alexei advised
> > against it in
> > https://lore.kernel.org/bpf/CAADnVQJtUD6+gYinr+6ensj58qt2LeBj4dvT7Cyu-aBCafsP5g@xxxxxxxxxxxxxx
> > ... so I have dropped it since.
>
> I see. Ok, we can ignore this for now, though I think we should consider
> revisiting this at some point once we've clarified the rules behind the
> implicit prologue/epilogue. Being able to actually specify noreturn
> really can make a difference in performance in some cases.
>

I agree. I will add this to my TODO list to explore after this set is merged.

> [...]
> > > > +
> > > > +SEC("tc")
> > > > +int exceptions_cleanup_reg(struct __sk_buff *ctx)
> > > > +{
> > > > +    void *p;
> > > > +
> > > > +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> > > > +    MARK_RESOURCE(p, RES_REG);
> > > > +    bpf_throw(VAL);
> > > > +    if (p)
> > > > +        bpf_ringbuf_discard(p, 0);
> > >
> > > Does the prog fail to load if you don't have this bpf_ringbuf_discard()
> > > check? I assume not given that in
> > > exceptions_cleanup_null_or_ptr_do_ptr() and elsewhere we do a reserve
> > > without discarding. Is there some subtle stack state difference here or
> > > something?
> > >
> >
> > So I will add comments explaining this, since I realized this confused
> > you in a couple of places, but basically if I didn't do a discard
> > here, the compiler wouldn't save the value of p across the bpf_throw
> > call. So it may end up in some caller-saved register (R1-R5) and since
> > bpf_throw needs things to be either saved in the stack or in
> > callee-saved regs (R6-R9) to be able to do the stack unwinding, we
> > would not be able to test the case where the resource is held in
> > R6-R9.
> >
> > In a correctly written program, in the path where bpf_throw is not
> > done, you will always have some cleanup code (otherwise your program
> > wouldn't pass), so the value should always end up being preserved
> > across a bpf_throw call (this is kind of why Alexei was sort of
> > worried about noreturn, because in that case the compiler may decide
> > to not preserve it for the bpf_throw path).
> > You cannot just leak a resource acquired before bpf_throw in the path
> > where exception is not thrown.
>
> Ok, that makes sense. I suppose another way to frame this would be to
> consider it in a typical scheduling scenario:
>
> struct task_ctx *lookup_task_ctx(struct task_struct *p)
> {
>         struct task_ctx *taskc;
>         s32 pid = p->pid;
>
>         taskc = bpf_map_lookup_elem(&task_data, &pid);
>         if (!taskc)
>                 bpf_throw(-ENOENT); // Verifier
>
>         return taskc;
> }
>
> void BPF_STRUCT_OPS(sched_stopping, struct task_struct *p, bool runnable)
> {
>         struct task_ctx *taskc;
>
>         taskc = lookup_task_ctx(p)
>
>         /* scale the execution time by the inverse of the weight and charge */
>         p->scx.dsq_vtime +=
>                 (bpf_ktime_get_ns() - taskc->running_at) * 100 / p->scx.weight;
> }
>
> We're not dropping a reference here, but taskc is preserved across the
> bpf_throw() path, so the same idea applies.
>

Yeah, I will add an example like this to the selftests to make sure we
also exercise such a pattern.

> > Also,  I think the test is a bit fragile, I should probably rewrite it
> > in inline assembly, because while the compiler chooses to hold it in a
> > register here, it is not bound to do so in this case.
>
> To that point, I wonder if it would be useful or possible to come up with some
> kind of a macro that allows us to specify a list of variables that must be
> preserved after a bpf_throw() call? Not sure how or if that would work exactly.
>

I think it can be useful, supposedly if we can force the compiler to
do a spill to the stack, that will be enough to enable unwinding.
But we should probably come back to this in case we see there are
certain compiler optimizations causing trouble.
Otherwise it's unnecessary cognitive overhead for someone writing a
program to have to explicitly mark variables like this.

> > > >  [...]
> > > >
> > > > -SEC("?tc")
> > > > -__failure __msg("Unreleased reference")
> > > > -int reject_with_reference(void *ctx)
> > > > -{
> > > > -     struct foo *f;
> > > > -
> > > > -     f = bpf_obj_new(typeof(*f));
> > > > -     if (!f)
> > > > -             return 0;
> > > > -     bpf_throw(0);
> > >
> > > Hmm, so why is this a memory leak exactly? Apologies if this is already
> > > explained clearly elsewhere in the stack.
> > >
> >
> > I will add comments around some of these to better explain this in the
> > non-RFC v1.
> > Basically, this program is sort of unrealistic (since it's always
> > throwing, and not really cleaning up the object since there is no
> > other path except the one with bpf_throw). So the compiler ends up
> > putting 'f' in a caller-saved register, during release_reference we
> > don't find it after bpf_throw has been processed (since caller-saved
> > regs have been cleared due to kfunc processing, and we generate frame
> > descriptors after check_kfunc_call, basically simulating the state
> > where only preserved state after the call is observed at runtime), but
> > the reference state still lingers around for 'f', so you get this
> > "Unreleased reference" error later when check_reference_leak is hit.
> >
> > It's just trying to exercise the case where the pointer tied to a
> > reference state has been lost in verifier state, and that we return an
> > error in such a case and don't succeed in verifying the program
> > accidently (because there is no way we can recover the value to free
> > at runtime).
>
> Makes total sense, thanks a lot for explaining!
>
> This looks great, I'm really excited to use it.

Thanks for the review! Will respin addressing your comments after
waiting for a day or two.