On 9/4/24 3:21 PM, Martin KaFai Lau wrote:
On 8/28/24 4:24 PM, Alexei Starovoitov wrote:
@@ -714,6 +869,11 @@ void bpf_obj_free_fields(const struct btf_record *rec,
void *obj)
field->kptr.dtor(xchgd_field);
}
break;
+ case BPF_UPTR:
+ if (*(void **)field_ptr)
+ bpf_obj_unpin_uptr(field, *(void **)field_ptr);
+ *(void **)field_ptr = NULL;
This one will be called from
task_storage_delete->bpf_selem_free->bpf_obj_free_fields
and even if upin was safe to do from that context
we cannot just do:
*(void **)field_ptr = NULL;
since bpf prog might be running in parallel,
it could have just read that addr and now is using it.
The first thought of a way to fix this was to split
bpf_obj_free_fields() into the current one plus
bpf_obj_free_fields_after_gp()
that will do the above unpin bit.
and call the later one from bpf_selem_free_rcu()
while bpf_obj_free_fields() from bpf_selem_free()
will not touch uptr.
But after digging further I realized that task_storage
already switched to use bpf_ma, so the above won't work.
So we need something similar to BPF_KPTR_REF logic:
xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
and then delay of uptr unpin for that address into call_rcu.
Any better ideas?
I think the existing reuse_now arg in the bpf_selem_free can be used. reuse_now
(renamed from the earlier use_trace_rcu) was added to avoid call_rcu_tasks_trace
for the common case.
selem (in type "struct bpf_local_storage_elem") is the one exposed to the bpf prog.
bpf_selem_free knows whether a selem can be reused immediately based on the
caller. It is currently flagged in the reuse_now arg: "bpf_selem_free(...., bool
reuse_now)".
If a selem cannot be reuse_now (i.e. == false), it is currently going through
"call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu)". We can do
unpin_user_page() in the rcu callback.
A selem can be reuse_now (i.e. == true) if the selem is no longer needed because
either its owner (i.e. the task_struct here) is going away in free_task() or the
bpf map is being destructed in bpf_local_storage_map_free(). No bpf prog should
have a hold on the selem at this point. I think for these two cases, the
unpin_user_page() can be directly called in bpf_selem_free().
One existing bug is, from looking at patch 6, I don't think the free_task() case
can be "reuse_now == true" anymore because of the bpf_task_release kfunc did not
mark the previously obtained task_storage to be invalid:
data_task = bpf_task_from_pid(parent_pid);
ptr = bpf_task_storage_get(&datamap, data_task, 0, ...);
bpf_task_release(data_task);
if (!ptr)
return 0;
/* The prog still holds a valid task storage ptr. */
udata = ptr->udata;
It can be fixed by marking the ref_obj_id of the "ptr". Although it is more
correct to make the task storage "ptr" invalid after task_release, it may break
the existing progs.
The same issue probably is true for cgroup_storage. There is no release kfunc
for inode and sk, so inode and sk storage should be fine.