Re: [External] Re: [PATCH bpf-next v2 1/9] bpf: tracing: add support to record and check the accessed args

梦龙董 <dongmenglong.8@xxxxxxxxxxxxx> · Thu, 28 Mar 2024 22:43:46 +0800

On Fri, Mar 15, 2024 at 4:00 PM 梦龙董 <dongmenglong.8@xxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 14, 2024 at 8:27 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Tue, Mar 12, 2024 at 6:53 PM 梦龙董 <dongmenglong.8@xxxxxxxxxxxxx> wrote:
> [......]
> > > What does "a hundred attachments max" means? Can't I
> > > trace thousands of kernel functions with a bpf program of
> > > tracing multi-link?
> >
> > I mean what time does it take to attach one program
> > to 100 fentry-s ?
> > What is the time for 1k and for 10k ?
> >
> > The kprobe multi test attaches to pretty much all funcs in
> > /sys/kernel/tracing/available_filter_functions
> > and it's fast enough to run in test_progs on every commit in bpf CI.
> > See get_syms() in prog_tests/kprobe_multi_test.c
> >
> > Can this new multi fentry do that?
> > and at what speed?
> > The answer will decide how applicable this api is going to be.
> > Generating different trampolines for every attach point
> > is an approach as well. Pls benchmark it too.
>
> I see. Creating plenty of trampolines does take a lot of time,
> and I'll do testing on it.
>

I have done a simple benchmark on creating 1000
trampolines. It is slow, quite slow, which consume up to
60s. We can't do it this way.

Now, I have a bad idea. How about we introduce
a "dynamic trampoline"? The basic logic of it can be:

"""
save regs
bpfs = trampoline_lookup_ip(ip)
fentry = bpfs->fentries
while fentry:
  fentry(ctx)
  fentry = fentry->next

call origin
save return value

fexit = bpfs->fexits
while fexit:
  fexit(ctx)
  fexit = fexit->next

xxxxxx
"""

And we lookup the "bpfs" by the function ip in a hash map
in trampoline_lookup_ip. The type of "bpfs" is:

struct bpf_array {
  struct bpf_prog *fentries;
 struct bpf_prog *fexits;
  struct bpf_prog *modify_returns;
}

When we need to attach the bpf progA to function A/B/C,
we only need to create the bpf_arrayA, bpf_arrayB, bpf_arrayC
and add the progA to them, and insert them to the hash map
"direct_call_bpfs", and attach the "dynamic trampoline" to
A/B/C. If bpf_arrayA exist, just add progA to the tail of
bpf_arrayA->fentries. When we need to attach progB to
B/C, just add progB to bpf_arrayB->fentries and
bpf_arrayB->fentries.

Compared to the trampoline, extra overhead is introduced
by the hash lookuping.

I have not begun to code yet, and I am not sure the overhead is
acceptable. Considering that we also need to do hash lookup
by the function in kprobe_multi, maybe the overhead is
acceptable?

Thanks!
Menglong Dong

> >
> > > >
> > > > Let's step back.
> [......]
> >
> > For one trampoline to handle all attach points we might
> > need some arch support, but we can start simple.
> > Make btf_func_model with MAX_BPF_FUNC_REG_ARGS
> > by calling btf_distill_func_proto() with func==NULL.
> > And use that to build a trampoline.
> >
> > The challenge is how to use minimal number of trampolines
> > when bpf_progA is attached for func1, func2, func3
> > and bpf_progB is attached to func3, func4, func5.
> > We'd still need 3 trampolines:
> > for func[12] to call bpf_progA,
> > for func3 to call bpf_progA and bpf_progB,
> > for func[45] to call bpf_progB.
> >
> > Jiri was trying to solve it in the past. His slides from LPC:
> > https://lpc.events/event/16/contributions/1350/attachments/1033/1983/plumbers.pdf
> >
> > Pls study them and his prior patchsets to avoid stepping on the same rakes.