Re: [PATCH RFC bpf-next 0/4] bpf: Add support to attach return prog in kprobe multi

Viktor Malik <vmalik@xxxxxxxxxx> · Mon, 19 Feb 2024 12:20:08 +0100

On 2/13/24 22:09, Jiri Olsa wrote:
> On Tue, Feb 13, 2024 at 10:20:46AM -0800, Andrii Nakryiko wrote:
>> On Tue, Feb 13, 2024 at 4:09 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>>>
>>> On Mon, Feb 12, 2024 at 08:06:06PM -0800, Andrii Nakryiko wrote:
>>>
>>> SNIP
>>>
>>>>>> But the way you implement it with extra flag and extra fd parameter
>>>>>> makes it harder to have a nice high-level support in libbpf (and
>>>>>> presumably other BPF loader libraries) for this.
>>>>>>
>>>>>> When I was thinking about doing something like this, I was considering
>>>>>> adding a new program type, actually. That way it's possible to define
>>>>>> this "let's skip return probe" protocol without backwards
>>>>>> compatibility concerns. It's easier to use it declaratively in libbpf.
>>>>>
>>>>> ok, that seems cleaner.. but we need to use current kprobe programs,
>>>>> so not sure at the moment how would that fit in.. did you mean new
>>>>> link type?
>>>>
>>>> It's kind of a less important detail, actually. New program type would
>>>> allow us to have an entirely different context type, but I think we
>>>> can make do with the existing kprobe program type. We can have a
>>>> separate attach_type and link type, just like multi-kprobe and
>>>> multi-uprobe are still kprobe programs.
>>>
>>> ok, having new attach type on top of kprobe_multi link makes sense
>>>
>>>>
>>>>>
>>>>>> You just declare SEC("kprobe.wrap/...") (or whatever the name,
>>>>>> something to designate that it's both entry and exit probe) as one
>>>>>> program and in the code there would be some way to determine whether
>>>>>> we are in entry mode or exit mode (helper or field in the custom
>>>>>> context type, the latter being faster and more usable, but it's
>>>>>> probably not critical).
>>>>>
>>>>> hum, so the single program would be for both entry and exit probe,
>>>>> I'll need to check how bad it'd be for us, but it'd probably mean
>>>>> just one extra tail call, so it's likely ok
>>>>
>>>> I guess, I don't know what you are doing there :) I'd recommend
>>>> looking at utilizing BPF global subprogs instead of tail calls, if
>>>> your kernel allows for that, as that's a saner way to scale BPF
>>>> verification.
>>>
>>> ok, we should probably do that.. given this enhancement will be
>>> available on latest kernel anyway, we could use global subprogs
>>> as well
>>>
>>> the related bpftrace might be bit more challenging.. will have to
>>> generate program calling entry or return program now, but seems
>>> doable of course
>>
>> So you want users to still have separate kprobe and kretprobe in
>> bpftrace, but combine them into this kwrapper transparently? It does
> 
> no I meant I'd need to generate the wrapper program for the new
> interface.. which is extra compared to current bpftrace changes

If you end up introducing this new kwrapper program type in libbpf, I
think that it'll make sense to have something similar in bpftrace, too.
Allowing users to write separate kprobe and kretprobe programs and
transparently combining them into kwrapper doesn't seem to bring much
value to me.

Jirka, if you need help with implementing bpftrace support for this, let
me know. I'm very interested in having this capability there.

Viktor

> 
>> seem doable, but hopefully we'll be able to write kwrapper programs in
>> bpftrace directly as well.
> 
> yes, it should be fine
> 
> SNIP
> 
>>>>
>>>> Yes, I realize special-casing zero might be a bit inconvenient, but I
>>>> think simplicity trumps a potential for zero to be a valid value (and
>>>> there are always ways to work around zero as a meaningful value).
>>>>
>>>> Now, in more complicated cases 8 bytes of temporary session state
>>>> isn't enough, just like BPF cookie being 8 byte (read-only) value
>>>> might not be enough. But the solution is the same as with the BPF
>>>> cookie. You just use those 8 bytes as a key into ARRAY/HASHMAP/whatnot
>>>> storage. It's simple and fast enough for pretty much any case.
>>>
>>> I was recently asked for a way to have function arguments available
>>> in the return kprobe as it is in fexit programs (which was not an
>>> option to use, because we don't have fast multi attach for it)
>>>
>>> using the hash map to store arguments and storing its key in the
>>> session data might be solution for this
>>
>> if you are ok using hashmap keyed by tid, you can do it today without
>> any kernel changes. With session cookie you'll be able to utilize
>> faster ARRAY map (by building a simple ID allocator to get a free slot
>> in ARRAY map).
> 
> ok
> 
> SNIP
> 
>>>> I bet there is something similar in the kretprobe case, where we can
>>>> carve out 8 bytes and pass it to both entry and exit parts of kwrapper
>>>> program.
>>>
>>> for kprobes.. both kprobe and kprobe_multi/fprobe use rethook to invoke
>>> return probes, so I guess we could use it and store that shared data
>>> in there
>>>
>>> btw Masami is in process of removing rethook from kprobe_multi/fprobe,
>>> as part of migrating fprobe on top of ftrace [0]
>>>
>>> but instead the rethook I think there'll be some sort of shadow stack/data
>>> area accessible from both entry and return probes, that we could use
>>
>> ok, cool. We also need to be careful to not share session cookie
>> between unrelated programs. E.g., if two independent kwrapper programs
>> are attached to the same function, they should each have their own
>> cookie. Otherwise it's not clear how to build anything reliable on top
>> of that, tbh. This might be a problem, though, right?
> 
> IIRC it's tracer specific data, the shadow stack data should be unique
> for tracer and its called function, but I'll double check on that
> 
> jirka
>