> On Jan 11, 2021, at 9:49 AM, Yonghong Song <yhs@xxxxxx> wrote: > > > > On 1/8/21 3:19 PM, Song Liu wrote: >> Replace hashtab with task local storage in runqslower. This improves the >> performance of these BPF programs. The following table summarizes average >> runtime of these programs, in nanoseconds: >> task-local hash-prealloc hash-no-prealloc >> handle__sched_wakeup 125 340 3124 >> handle__sched_wakeup_new 2812 1510 2998 >> handle__sched_switch 151 208 991 >> Note that, task local storage gives better performance than hashtab for >> handle__sched_wakeup and handle__sched_switch. On the other hand, for >> handle__sched_wakeup_new, task local storage is slower than hashtab with >> prealloc. This is because handle__sched_wakeup_new accesses the data for >> the first time, so it has to allocate the data for task local storage. >> Once the initial allocation is done, subsequent accesses, as those in >> handle__sched_wakeup, are much faster with task local storage. If we >> disable hashtab prealloc, task local storage is much faster for all 3 >> functions. >> Signed-off-by: Song Liu <songliubraving@xxxxxx> >> --- >> tools/bpf/runqslower/runqslower.bpf.c | 26 +++++++++++++++----------- >> 1 file changed, 15 insertions(+), 11 deletions(-) >> diff --git a/tools/bpf/runqslower/runqslower.bpf.c b/tools/bpf/runqslower/runqslower.bpf.c >> index 1f18a409f0443..c4de4179a0a17 100644 >> --- a/tools/bpf/runqslower/runqslower.bpf.c >> +++ b/tools/bpf/runqslower/runqslower.bpf.c >> @@ -11,9 +11,9 @@ const volatile __u64 min_us = 0; >> const volatile pid_t targ_pid = 0; >> struct { >> - __uint(type, BPF_MAP_TYPE_HASH); >> - __uint(max_entries, 10240); >> - __type(key, u32); >> + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); >> + __uint(map_flags, BPF_F_NO_PREALLOC); >> + __type(key, int); >> __type(value, u64); >> } start SEC(".maps"); >> @@ -25,15 +25,19 @@ struct { >> /* record enqueue timestamp */ >> __always_inline >> -static int trace_enqueue(u32 tgid, u32 pid) >> +static int trace_enqueue(struct task_struct *t) >> { >> - u64 ts; >> + u32 pid = t->pid; >> + u64 ts, *ptr; >> if (!pid || (targ_pid && targ_pid != pid)) >> return 0; >> ts = bpf_ktime_get_ns(); >> - bpf_map_update_elem(&start, &pid, &ts, 0); >> + ptr = bpf_task_storage_get(&start, t, 0, >> + BPF_LOCAL_STORAGE_GET_F_CREATE); >> + if (ptr) >> + *ptr = ts; >> return 0; >> } >> @@ -43,7 +47,7 @@ int handle__sched_wakeup(u64 *ctx) >> /* TP_PROTO(struct task_struct *p) */ >> struct task_struct *p = (void *)ctx[0]; >> - return trace_enqueue(p->tgid, p->pid); >> + return trace_enqueue(p); >> } >> SEC("tp_btf/sched_wakeup_new") >> @@ -52,7 +56,7 @@ int handle__sched_wakeup_new(u64 *ctx) >> /* TP_PROTO(struct task_struct *p) */ >> struct task_struct *p = (void *)ctx[0]; >> - return trace_enqueue(p->tgid, p->pid); >> + return trace_enqueue(p); >> } >> SEC("tp_btf/sched_switch") >> @@ -70,12 +74,12 @@ int handle__sched_switch(u64 *ctx) >> /* ivcsw: treat like an enqueue event and store timestamp */ >> if (prev->state == TASK_RUNNING) >> - trace_enqueue(prev->tgid, prev->pid); >> + trace_enqueue(prev); >> pid = next->pid; >> /* fetch timestamp and calculate delta */ >> - tsp = bpf_map_lookup_elem(&start, &pid); >> + tsp = bpf_task_storage_get(&start, next, 0, 0); >> if (!tsp) >> return 0; /* missed enqueue */ > > Previously, hash table may overflow so we may have missed enqueue. > Here with task local storage, is it possible to add additional pid > filtering in the beginning of handle__sched_switch such that > missed enqueue here can be treated as an error? IIUC, hashtab overflow is not the only reason of missed enqueue. If the wakeup (which calls trace_enqueue) happens before runqslower starts, we may still get missed enqueue in sched_switch, no? Thanks, Song