> On Mar 29, 2022, at 7:47 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Tue, Mar 29, 2022 at 5:39 PM Song Liu <songliubraving@xxxxxx> wrote: >> >> >> >>> On Mar 29, 2022, at 5:00 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: >>> >>> On Tue, Mar 29, 2022 at 4:19 PM Song Liu <song@xxxxxxxxxx> wrote: >>>> >>>> TP_PROTO of sched_switch is updated with a new arg prev_state, which >>>> causes runqslower load failure: >>>> >>>> libbpf: prog 'handle__sched_switch': BPF program load failed: Permission denied >>>> libbpf: prog 'handle__sched_switch': -- BEGIN PROG LOAD LOG -- >>>> R1 type=ctx expected=fp >>>> 0: R1=ctx(off=0,imm=0) R10=fp0 >>>> ; int handle__sched_switch(u64 *ctx) >>>> 0: (bf) r7 = r1 ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0) >>>> ; struct task_struct *next = (struct task_struct *)ctx[2]; >>>> 1: (79) r6 = *(u64 *)(r7 +16) >>>> func 'sched_switch' arg2 has btf_id 186 type STRUCT 'task_struct' >>>> 2: R6_w=ptr_task_struct(off=0,imm=0) R7_w=ctx(off=0,imm=0) >>>> ; struct task_struct *prev = (struct task_struct *)ctx[1]; >>>> 2: (79) r2 = *(u64 *)(r7 +8) ; R2_w=scalar() R7_w=ctx(off=0,imm=0) >>>> 3: (b7) r1 = 0 ; R1_w=0 >>>> ; struct runq_event event = {}; >>>> 4: (7b) *(u64 *)(r10 -8) = r1 ; R1_w=P0 R10=fp0 fp-8_w=00000000 >>>> 5: (7b) *(u64 *)(r10 -16) = r1 ; R1_w=P0 R10=fp0 fp-16_w=00000000 >>>> 6: (7b) *(u64 *)(r10 -24) = r1 ; R1_w=P0 R10=fp0 fp-24_w=00000000 >>>> 7: (7b) *(u64 *)(r10 -32) = r1 ; R1_w=P0 R10=fp0 fp-32_w=00000000 >>>> ; if (prev->__state == TASK_RUNNING) >>>> 8: (61) r1 = *(u32 *)(r2 +24) >>>> R2 invalid mem access 'scalar' >>>> processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 >>>> -- END PROG LOAD LOG -- >>>> libbpf: failed to load program 'handle__sched_switch' >>>> libbpf: failed to load object 'runqslower_bpf' >>>> libbpf: failed to load BPF skeleton 'runqslower_bpf': -13 >>>> failed to load BPF object: -13 >>>> >>>> Update runqslower to fix this issue. Also, as we are on this, use BPF_PROG >>>> in runqslower for cleaner code. >>>> >>>> Fixes: fa2c3254d7cf ("sched/tracing: Don't re-read p->state when emitting sched_switch event") >>>> Signed-off-by: Song Liu <song@xxxxxxxxxx> >>>> --- >>>> tools/bpf/runqslower/runqslower.bpf.c | 19 +++++-------------- >>>> 1 file changed, 5 insertions(+), 14 deletions(-) >>>> >>> >>> It would be much less disruptive if that prev_state was added after >>> "next", but oh well... >> >> Maybe we should change that. >> >> +Valentin and Steven, how about we change the order with the attached >> diff (not the original patch in this thread, but the one at the end of >> this email)? >> >>> >>> But anyways, let's handle this in a way that can handle both old >>> kernels and new ones and do the same change in libbpf-tool's >>> runqslower ([0]). Can you please follow up there as well? >> >> Yeah, I will also fix that one. > > Thanks! > >> >>> >>> >>> We can use BPF CO-RE to detect which order of arguments running kernel >>> has by checking prev_state field existence in struct >>> trace_event_raw_sched_switch. Can you please try that? Use >>> bpf_core_field_exists() for that. >> >> Do you mean something like >> >> if (bpf_core_field_exists(ctx->prev_state)) >> /* use ctx[2] and ctx[3] */ >> else >> /* use ctx[1] and ctx[2] */ > > yep, that's what I meant, except you don't have ctx->prev_state, you have to do: > > if (bpf_core_field_exists(((struct trace_event_raw_sched_switch > *)0)->prev_state)) > >> >> ? I think we will need BTF for the arguments, which doesn't exist yet. >> Did I miss something? > > Probably :) struct trace_event_raw_sched_switch is in vmlinux.h > already for non-raw sched:sched_switch tracepoint. We just use that > type information for feature probing. From BPF verifier's perspective, > ctx[1] or ctx[2] will have proper BTF information (task_struct) during > program validation. Sigh. I run pahole and saw trace_event_raw_sched_switch. Somehow I thought it was not the one. Will send v2 tomorrow. Thanks, Song