David Vernet wrote: > On Thu, Nov 17, 2022 at 02:36:50PM -0800, John Fastabend wrote: > > David Vernet wrote: > > > On Thu, Nov 17, 2022 at 01:03:45PM -0800, John Fastabend wrote: > > > > David Vernet wrote: > > > > > Now that BPF supports adding new kernel functions with kfuncs, and > > > > > storing kernel objects in maps with kptrs, we can add a set of kfuncs > > > > > which allow struct task_struct objects to be stored in maps as > > > > > referenced kptrs. > > > > > > > > > > The possible use cases for doing this are plentiful. During tracing, > > > > > for example, it would be useful to be able to collect some tasks that > > > > > performed a certain operation, and then periodically summarize who they > > > > > are, which cgroup they're in, how much CPU time they've utilized, etc. > > > > > Doing this now would require storing the tasks' pids along with some > > > > > relevant data to be exported to user space, and later associating the > > > > > pids to tasks in other event handlers where the data is recorded. > > > > > Another useful by-product of this is that it allows a program to pin a > > > > > task in a BPF program, and by proxy therefore also e.g. pin its task > > > > > local storage. > > > > > > > > Sorry wasn't obvious to me (late to the party so if it was in some > > > > other v* described apologies). Can we say something about the life > > > > cycle of this acquired task_structs because they are incrementing > > > > the ref cnt on the task struct they have potential to impact system. > > > > > > We should probably add an entire docs page which describes how kptrs > > > work, and I am happy to do that (ideally in a follow-on patch set if > > > that's OK with you). In general I think it would be useful to include > > > docs for any general-purpose kfuncs like the ones proposed in this set. > > > > Sure, I wouldn't require that for your series though fwiw. > > Sounds good to me > > [...] > > > > > quick question. If you put acquired task struct in a map what > > > > happens if user side deletes the entry? Presumably this causes the > > > > release to happen and the task_struct is good to go. Did I miss > > > > the logic? I was thinking you would have something in bpf_map_free_kptrs > > > > and a type callback to release() the refcnt? > > > > > > Someone else can chime in here to correct me if I'm wrong, but AFAIU > > > this is handled by the map implementations calling out to > > > bpf_obj_free_fields() to invoke the kptr destructor when the element is > > > destroyed. See [3] and [4] for examples of where they're called from the > > > arraymap and hashmap logic respectively. This is how the destructors are > > > similarly invoked when the maps are destroyed. > > > > Yep I found the dtor() gets populated in btf.c and apparently needed > > to repull my local tree because I missed it. Thanks for the detailed > > response. > > > > And last thing I was checking is because KF_SLEEPABLE is not set > > this should be blocked from running on sleepable progs which would > > break the call_rcu in the destructor. Maybe small nit, not sure > > its worth it but might be nice to annotate the helper description > > with a note, "will not work on sleepable progs" or something to > > that effect. > > KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep, > not whether the calling program can be sleepable. call_rcu() doesn't > block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if > a kfunc is sleepable, non-sleepable programs are not able to call it > (and this is enforced in the verifier). OK but should these helpers be allowed in sleepable progs? I think not. What stops this, (using your helpers): cpu0 cpu1 ---- v = insert_lookup_task(task) kptr = bpf_kptr_xchg(&v->task, NULL); if (!kptr) return 0; map_delete_elem() put_task() rcu_call do_something_might_sleep() put_task_struct ... free kptr->[free'd memory] the insert_lookup_task will bump the refcnt on the acquire on map insert. But the lookup doesn't do anything to the refcnt and the map_delete_elem will delete it. We have a check for spin_lock types to stop them from being in sleepable progs. Did I miss a similar check for these? Thanks again