On Thu, Aug 15, 2024 at 1:48 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Thu, Aug 08, 2024 at 08:43:05AM -0700, Alexei Starovoitov wrote: > > On Thu, Aug 8, 2024 at 3:46 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > > > On Tue, Aug 06, 2024 at 11:44:52AM -0700, Alexei Starovoitov wrote: > > > > On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > > > > > > > > Jiri, > > > > > > > > > > > > the verifier removes the check because it assumes that pointers > > > > > > passed by the kernel into tracepoint are valid and trusted. > > > > > > In this case: > > > > > > trace_sched_pi_setprio(p, pi_task); > > > > > > > > > > > > pi_task can be NULL. > > > > > > > > > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL > > > > > > by default, since it will break a bunch of progs. > > > > > > Instead we can annotate this tracepoint arg as __nullable and > > > > > > teach the verifier to recognize such special arguments of tracepoints. > > > > > > > > > > ok, so you mean to be able to mark it in event header like: > > > > > > > > > > TRACE_EVENT(sched_pi_setprio, > > > > > TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable), > > > > > > > > > > I guess we could make pahole to emit DECL_TAG for that argument, > > > > > but I'm not sure how to propagate that __nullable info to pahole > > > > > > > > > > while wondering about that, I tried the direct fix below ;-) > > > > > > > > We don't need to rush such a hack below. > > > > No need to add decl_tag and change pahole either. > > > > The arg name is already vmlinux BTF: > > > > [51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3 > > > > '__data' type_id=61 > > > > 'tsk' type_id=77 > > > > 'pi_task' type_id=77 > > > > [51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static > > > > > > > > just need to rename "pi_task" to "pi_task__nullable" > > > > and teach the verifier. > > > > > > the problem is that btf_trace_<xxx> is typedef > > > > > > typedef void (*btf_trace_##call)(void *__data, proto); > > > > > > and dwarf does not store argument names for subroutine type entry, > > > so it's not in BTF's TYPEDEF either > > > > > > it's the btf_trace_##call typedef ID that verifier has to work with, > > > I wonder we could somehow associate that ID with __bpf_trace_##call > > > subroutine entry which has the argument names > > > > > > we could store __bpf_trace_##call's BTF_ID in __bpf_raw_tp_map record, > > > but we'd need to do the lookup based on the tracepoint name when loading > > > the program .. ATM we do the lookup __bpf_raw_tp_map record only when > > > doing attach, so we would need to move it to program load time > > > > > > or we could 'fix' the argument names in pahole, but that'd probably > > > mean extra setup and hash lookup, so also not great > > > > I would do a simple string search in vmlinux BTF for "__bpf_trace" + tp name. > > No need to add btf_id-s and waste memory to speed up the slow path. > > I checked bit more and there are more tracepoints with the same issue, > the first diff stat looks like: > > include/trace/events/afs.h | 44 ++++++++++++++++++++++---------------------- > include/trace/events/cachefiles.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------ > include/trace/events/ext4.h | 6 +++--- > include/trace/events/fib.h | 16 ++++++++-------- > include/trace/events/filelock.h | 38 +++++++++++++++++++------------------- > include/trace/events/host1x.h | 10 +++++----- > include/trace/events/huge_memory.h | 24 ++++++++++++------------ > include/trace/events/kmem.h | 18 +++++++++--------- > include/trace/events/netfs.h | 16 ++++++++-------- > include/trace/events/power.h | 6 +++--- > include/trace/events/qdisc.h | 8 ++++---- > include/trace/events/rxrpc.h | 12 ++++++------ > include/trace/events/sched.h | 12 ++++++------ > include/trace/events/sunrpc.h | 8 ++++---- > include/trace/events/tcp.h | 14 +++++++------- > include/trace/events/tegra_apb_dma.h | 6 +++--- > include/trace/events/timer_migration.h | 10 +++++----- > include/trace/events/writeback.h | 16 ++++++++-------- > > plus there's one case where pointer needs to be checked with IS_ERR in > include/trace/events/rdma_core.h trace_mr_alloc/mr_integ_alloc > > I'm not excited about the '_nullable' argument suffix, because it's lot > of extra changes/renames in TP_fast_assign and it does not solve the > IS_ERR case above > > I checked on the type tag and with llvm build we get the TYPE_TAG info > nicely in BTF: > > [119148] TYPEDEF 'btf_trace_sched_pi_setprio' type_id=119149 > [119149] PTR '(anon)' type_id=119150 > [119150] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3 > '(anon)' type_id=27 > '(anon)' type_id=678 > '(anon)' type_id=119152 > [119151] TYPE_TAG 'nullable' type_id=679 > [119152] PTR '(anon)' type_id=119151 > > [679] STRUCT 'task_struct' size=15424 vlen=277 > > which we can easily check in verifier.. the tracepoint definition would look like: > > - TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task), > + TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task), > > and no other change in TP_fast_assign is needed > > I think using the type tag for this is nicer, but I'm not sure where's > gcc at with btf_type_tag implementation, need to check on that Unfortunately last time I heard gcc was still far. So we cannot rely on decl_tag or type_tag yet. Aside from __nullable we would need another suffix to indicate is_err. Maybe we can do something with the TP* macro? So the suffix only seen one place instead of search-and-replace through the body? but imo above diff stat doesn't look too bad.