On Wed, Mar 30, 2022 at 2:11 PM Song Liu <songliubraving@xxxxxx> wrote: > > > > > On Mar 29, 2022, at 11:43 PM, Song Liu <songliubraving@xxxxxx> wrote: > > > > > > > >> On Mar 29, 2022, at 7:47 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > >> > >> On Tue, Mar 29, 2022 at 5:39 PM Song Liu <songliubraving@xxxxxx> wrote: > >>> > >>> > >>> > >>>> On Mar 29, 2022, at 5:00 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > >>>> > >>>> On Tue, Mar 29, 2022 at 4:19 PM Song Liu <song@xxxxxxxxxx> wrote: > >>>>> > >>>>> TP_PROTO of sched_switch is updated with a new arg prev_state, which > >>>>> causes runqslower load failure: > >>>>> > >>>>> libbpf: prog 'handle__sched_switch': BPF program load failed: Permission denied > >>>>> libbpf: prog 'handle__sched_switch': -- BEGIN PROG LOAD LOG -- > >>>>> R1 type=ctx expected=fp > >>>>> 0: R1=ctx(off=0,imm=0) R10=fp0 > >>>>> ; int handle__sched_switch(u64 *ctx) > >>>>> 0: (bf) r7 = r1 ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0) > >>>>> ; struct task_struct *next = (struct task_struct *)ctx[2]; > >>>>> 1: (79) r6 = *(u64 *)(r7 +16) > >>>>> func 'sched_switch' arg2 has btf_id 186 type STRUCT 'task_struct' > >>>>> 2: R6_w=ptr_task_struct(off=0,imm=0) R7_w=ctx(off=0,imm=0) > >>>>> ; struct task_struct *prev = (struct task_struct *)ctx[1]; > >>>>> 2: (79) r2 = *(u64 *)(r7 +8) ; R2_w=scalar() R7_w=ctx(off=0,imm=0) > >>>>> 3: (b7) r1 = 0 ; R1_w=0 > >>>>> ; struct runq_event event = {}; > >>>>> 4: (7b) *(u64 *)(r10 -8) = r1 ; R1_w=P0 R10=fp0 fp-8_w=00000000 > >>>>> 5: (7b) *(u64 *)(r10 -16) = r1 ; R1_w=P0 R10=fp0 fp-16_w=00000000 > >>>>> 6: (7b) *(u64 *)(r10 -24) = r1 ; R1_w=P0 R10=fp0 fp-24_w=00000000 > >>>>> 7: (7b) *(u64 *)(r10 -32) = r1 ; R1_w=P0 R10=fp0 fp-32_w=00000000 > >>>>> ; if (prev->__state == TASK_RUNNING) > >>>>> 8: (61) r1 = *(u32 *)(r2 +24) > >>>>> R2 invalid mem access 'scalar' > >>>>> processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 > >>>>> -- END PROG LOAD LOG -- > >>>>> libbpf: failed to load program 'handle__sched_switch' > >>>>> libbpf: failed to load object 'runqslower_bpf' > >>>>> libbpf: failed to load BPF skeleton 'runqslower_bpf': -13 > >>>>> failed to load BPF object: -13 > >>>>> > >>>>> Update runqslower to fix this issue. Also, as we are on this, use BPF_PROG > >>>>> in runqslower for cleaner code. > >>>>> > >>>>> Fixes: fa2c3254d7cf ("sched/tracing: Don't re-read p->state when emitting sched_switch event") > >>>>> Signed-off-by: Song Liu <song@xxxxxxxxxx> > >>>>> --- > >>>>> tools/bpf/runqslower/runqslower.bpf.c | 19 +++++-------------- > >>>>> 1 file changed, 5 insertions(+), 14 deletions(-) > >>>>> > >>>> > >>>> It would be much less disruptive if that prev_state was added after > >>>> "next", but oh well... > >>> > >>> Maybe we should change that. > >>> > >>> +Valentin and Steven, how about we change the order with the attached > >>> diff (not the original patch in this thread, but the one at the end of > >>> this email)? > >>> > >>>> > >>>> But anyways, let's handle this in a way that can handle both old > >>>> kernels and new ones and do the same change in libbpf-tool's > >>>> runqslower ([0]). Can you please follow up there as well? > >>> > >>> Yeah, I will also fix that one. > >> > >> Thanks! > >> > >>> > >>>> > >>>> > >>>> We can use BPF CO-RE to detect which order of arguments running kernel > >>>> has by checking prev_state field existence in struct > >>>> trace_event_raw_sched_switch. Can you please try that? Use > >>>> bpf_core_field_exists() for that. > >>> > >>> Do you mean something like > >>> > >>> if (bpf_core_field_exists(ctx->prev_state)) > >>> /* use ctx[2] and ctx[3] */ > >>> else > >>> /* use ctx[1] and ctx[2] */ > >> > >> yep, that's what I meant, except you don't have ctx->prev_state, you have to do: > >> > >> if (bpf_core_field_exists(((struct trace_event_raw_sched_switch > >> *)0)->prev_state)) > > Actually, struct trace_event_raw_sched_switch is not the arguments we > have for tp_btf. For both older and newer kernel, it is the same: > > struct trace_event_raw_sched_switch { > struct trace_entry ent; > char prev_comm[16]; > pid_t prev_pid; > int prev_prio; > long int prev_state; > char next_comm[16]; > pid_t next_pid; > int next_prio; > char __data[0]; > }; > > So I guess this check won't work? Ah, you are right, we had prev_state in this struct before. There seems to be func proto for each raw tracepoint: typedef void (*btf_trace_sched_switch)(void *, bool, unsigned int, struct task_struct *, struct task_struct *); But I'm not sure if we can use that. I'll need to play with it a bit tomorrow to see if any of bpf_core_xxx() checks can be used. > > Thanks, > Song >