Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 15, 2024 at 1:48 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Thu, Aug 08, 2024 at 08:43:05AM -0700, Alexei Starovoitov wrote:
> > On Thu, Aug 8, 2024 at 3:46 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > >
> > > On Tue, Aug 06, 2024 at 11:44:52AM -0700, Alexei Starovoitov wrote:
> > > > On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > > >
> > > > > > Jiri,
> > > > > >
> > > > > > the verifier removes the check because it assumes that pointers
> > > > > > passed by the kernel into tracepoint are valid and trusted.
> > > > > > In this case:
> > > > > >         trace_sched_pi_setprio(p, pi_task);
> > > > > >
> > > > > > pi_task can be NULL.
> > > > > >
> > > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > > > by default, since it will break a bunch of progs.
> > > > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > > > teach the verifier to recognize such special arguments of tracepoints.
> > > > >
> > > > > ok, so you mean to be able to mark it in event header like:
> > > > >
> > > > >   TRACE_EVENT(sched_pi_setprio,
> > > > >         TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),
> > > > >
> > > > > I guess we could make pahole to emit DECL_TAG for that argument,
> > > > > but I'm not sure how to propagate that __nullable info to pahole
> > > > >
> > > > > while wondering about that, I tried the direct fix below ;-)
> > > >
> > > > We don't need to rush such a hack below.
> > > > No need to add decl_tag and change pahole either.
> > > > The arg name is already vmlinux BTF:
> > > > [51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
> > > >         '__data' type_id=61
> > > >         'tsk' type_id=77
> > > >         'pi_task' type_id=77
> > > > [51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static
> > > >
> > > > just need to rename "pi_task" to "pi_task__nullable"
> > > > and teach the verifier.
> > >
> > > the problem is that btf_trace_<xxx> is typedef
> > >
> > >   typedef void (*btf_trace_##call)(void *__data, proto);
> > >
> > > and dwarf does not store argument names for subroutine type entry,
> > > so it's not in BTF's TYPEDEF either
> > >
> > > it's the btf_trace_##call typedef ID that verifier has to work with,
> > > I wonder we could somehow associate that ID with __bpf_trace_##call
> > > subroutine entry which has the argument names
> > >
> > > we could store __bpf_trace_##call's BTF_ID in __bpf_raw_tp_map record,
> > > but we'd need to do the lookup based on the tracepoint name when loading
> > > the program .. ATM we do the lookup __bpf_raw_tp_map record only when
> > > doing attach, so we would need to move it to program load time
> > >
> > > or we could 'fix' the argument names in pahole, but that'd probably
> > > mean extra setup and hash lookup, so also not great
> >
> > I would do a simple string search in vmlinux BTF for "__bpf_trace" + tp name.
> > No need to add btf_id-s and waste memory to speed up the slow path.
>
> I checked bit more and there are more tracepoints with the same issue,
> the first diff stat looks like:
>
>          include/trace/events/afs.h                            | 44 ++++++++++++++++++++++----------------------
>          include/trace/events/cachefiles.h                     | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
>          include/trace/events/ext4.h                           |  6 +++---
>          include/trace/events/fib.h                            | 16 ++++++++--------
>          include/trace/events/filelock.h                       | 38 +++++++++++++++++++-------------------
>          include/trace/events/host1x.h                         | 10 +++++-----
>          include/trace/events/huge_memory.h                    | 24 ++++++++++++------------
>          include/trace/events/kmem.h                           | 18 +++++++++---------
>          include/trace/events/netfs.h                          | 16 ++++++++--------
>          include/trace/events/power.h                          |  6 +++---
>          include/trace/events/qdisc.h                          |  8 ++++----
>          include/trace/events/rxrpc.h                          | 12 ++++++------
>          include/trace/events/sched.h                          | 12 ++++++------
>          include/trace/events/sunrpc.h                         |  8 ++++----
>          include/trace/events/tcp.h                            | 14 +++++++-------
>          include/trace/events/tegra_apb_dma.h                  |  6 +++---
>          include/trace/events/timer_migration.h                | 10 +++++-----
>          include/trace/events/writeback.h                      | 16 ++++++++--------
>
> plus there's one case where pointer needs to be checked with IS_ERR in
> include/trace/events/rdma_core.h trace_mr_alloc/mr_integ_alloc
>
> I'm not excited about the '_nullable' argument suffix, because it's lot
> of extra changes/renames in TP_fast_assign and it does not solve the
> IS_ERR case above
>
> I checked on the type tag and with llvm build we get the TYPE_TAG info
> nicely in BTF:
>
>         [119148] TYPEDEF 'btf_trace_sched_pi_setprio' type_id=119149
>         [119149] PTR '(anon)' type_id=119150
>         [119150] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
>                 '(anon)' type_id=27
>                 '(anon)' type_id=678
>                 '(anon)' type_id=119152
>         [119151] TYPE_TAG 'nullable' type_id=679
>         [119152] PTR '(anon)' type_id=119151
>
>         [679] STRUCT 'task_struct' size=15424 vlen=277
>
> which we can easily check in verifier.. the tracepoint definition would look like:
>
>         -       TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
>         +       TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task),
>
> and no other change in TP_fast_assign is needed
>
> I think using the type tag for this is nicer, but I'm not sure where's
> gcc at with btf_type_tag implementation, need to check on that

Unfortunately last time I heard gcc was still far.
So we cannot rely on decl_tag or type_tag yet.
Aside from __nullable we would need another suffix to indicate is_err.

Maybe we can do something with the TP* macro?
So the suffix only seen one place instead of search-and-replace
through the body?

but imo above diff stat doesn't look too bad.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux