On 9/6/19 9:30 PM, Gautam Kulkarni wrote: > Hi, > > We are evaluating eBPF as a means to account voluntary and > non-voluntary context switches against cgroups. Currently, this > information is only present in the task_struct for an individual > process and not in the cgroup data structure. > > With this context, I was looking for recommendation on the following > possible approaches: > > 1. Use the existing tracepoint (trace_sched_switch) as it exists here > with BPF_PROG_TYPE_TRACEPOINT: > https://github.com/torvalds/linux/blob/master/kernel/sched/core.c#L3877 > However, based on the trace format, the kernel does not expose > prev->nivcsw and prev->nvcsw. Due to this, I feel like this approach > may not be feasible. Is my understanding correct? You can use BPF_RAW_TRACEPOINT_OPEN and `prev` argument will be available to bpf programs. > > 2. Attach a kprobe to __schedule() and use BPF_PROG_TYPE_KPROBE > This will allow us access to the prev pointer. From the prev > (task_struct), we can access the cgroup and use an eBPF map to > accumulate per cgroup counts of context switches. > > 3. Implement a kernel module that attaches a kprobe to __schedule() > and implement the map in the kprobe handler. > > 4. Modify the kernel to have context switch information in task_group. > Would this be something that would make sense to the community? > > I would highly appreciate any feedback on this. > > Regards, > Gautam >