Re: [PATCH RESEND bpf-next v3 4/9] bpf: support attaching freplace programs to multiple attach points

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Tue, 15 Sep 2020 13:35:28 +0200

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:

> On Mon, Sep 14, 2020 at 9:08 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:
>>
>> > On Fri, Sep 11, 2020 at 3:01 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> >>
>> >> From: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
>> >>
>> >> This enables support for attaching freplace programs to multiple attach
>> >> points. It does this by amending UAPI for bpf_raw_tracepoint_open with a
>> >> target prog fd and btf ID pair that can be used to supply the new
>> >> attachment point. The target must be compatible with the target that was
>> >> supplied at program load time.
>> >>
>> >> The implementation reuses the checks that were factored out of
>> >> check_attach_btf_id() to ensure compatibility between the BTF types of the
>> >> old and new attachment. If these match, a new bpf_tracing_link will be
>> >> created for the new attach target, allowing multiple attachments to
>> >> co-exist simultaneously.
>> >>
>> >> The code could theoretically support multiple-attach of other types of
>> >> tracing programs as well, but since I don't have a use case for any of
>> >> those, the bpf_tracing_prog_attach() function will reject new targets for
>> >> anything other than PROG_TYPE_EXT programs.
>> >>
>> >> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
>> >> ---
>> >
>> > It feels like using a semi-constructed bpf_tracing_link inside
>> > prog->aux->tgt_link is just an unnecessary complication, after reading
>> > this and previous patches. Seems more straightforward and simpler to
>> > store tgt_attach_type/tgt_prog_type (permanently) and
>> > tgt_prog/tgt_trampoline (until first attachment) in prog->aux and then
>> > properly create bpf_link on attach.
>>
>> I updated v4 with your comments, but kept the link in prog->aux; the
>> reason being that having a container for the two pointers makes it
>> possible to atomically swap it out with xchg() as you suggested
>> previously. Could you please take a look at v4? If you still think it's
>> better to just keep two separate pointers (and add a lock) in prog->aux,
>> I can change it to that. But I'd rather avoid the lock if possible...
>
> I took a very quick look at this specific bit, planning to do another
> pass tomorrow.
>
> What's the problem with adding a mutex to bpf_prog_aux? In your case,
> now you introduced (unlikely, but still) extra state transition for
> tgt_link from non-NULL to NULL and then back to non-NULL? And why?
> Just to use atomic xchg, while using atomic operation is not an
> absolute necessity because it's not a performance-critical path at
> all. We are not optimizing for millions of freplace attachments a
> second, right? On the other hand, having a mutex there won't require
> restoration logic, it will be dead simple, obvious and
> straightforward. So yeah, I still think mutex is better there.

So I went ahead and implemented a mutex-based version of this. I'm not
sure I agree with "dead simple", I'd say it's on par with the previous
version; and that is only if I explicitly limit the scope of the mutex
to *writing* of the tgt_* pointers (i.e., I haven't gone through and
protected all the reads from within the verifier).

The mutex version does have the benefit of still making it possible to
retry a bpf_raw_tracepoint_open() if it fails, so I guess that is a
benefit; I'll post it as v5 and you can have a look :)

> BTW, check Stanislav's latest patch set. He's adding used_maps_mutex
> to bpf_prog_aux with no problems at all. It seems to me that we might
> want to generalize that used_maps_mutex to be just bpf_prog_aux's
> mutex ('prog_aux_mutex' or whatever we'd call it) and use it for such
> kinds of low-frequency bpf_prog metadata manipulations/checks.

I'm not sure I like the idea of widening the scope of the mutex. Or at
least I think that should be done as a follow-up patch that does a
proper analysis of all the different fields it is supposed to protect
and makes sure no deadlocks creep in.

-Toke