On 5/3/23 4:06 PM, Alexei Starovoitov wrote:
On Wed, May 03, 2023 at 02:57:03PM -0700, Martin KaFai Lau wrote:
On 5/3/23 11:48 AM, Alexei Starovoitov wrote:
What it means that sleepable progs using hashmap will be able to avoid uaf with bpf_rcu_read_lock().
Without explicit bpf_rcu_read_lock() it's still safe and equivalent to existing behavior of bpf_mem_alloc.
(while your proposed BPF_MA_FREE_AFTER_RCU_GP flavor is not safe to use in hashtab with sleepable progs)
After that we can unconditionally remove rcu_head/call_rcu from bpf_cpumask and improve usability of bpf_obj_drop.
Probably usage of bpf_mem_alloc in local storage can be simplified as well.
Martin wdyt?
If the bpf prog always does a bpf_rcu_read_lock() before accessing the
(e.g.) task local storage, it can remove the reuse_now conditions in the
bpf_local_storage and directly call the bpf_mem_cache_free().
The only corner use case is when the bpf_prog or syscall does
bpf_task_storage_delete() instead of having the task storage stays with the
whole lifetime of the task_struct. Using REUSE_AFTER_RCU_GP will be a change
of this uaf guarantee to the sleepable program but it is still safe because
it is freed after tasks_trace gp. We could take this chance to align this
behavior of the local storage map to the other bpf maps.
For BPF_MA_FREE_AFTER_RCU_GP, there are cases that the bpf local storage
knows it can be freed without waiting tasks_trace gp. However, only
task/cgroup storages are in bpf ma and I don't believe this optimization
matter much for them. I would rather focus on the REUSE_AFTER_RCU_GP first.
I'm confused which REUSE_AFTER_RCU_GP you meant.
What I proposed above is REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace
Regarding REUSE_AFTER_RCU_GP, I meant
REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace.
Hou's proposals: 1. BPF_MA_REUSE_AFTER_two_RCUs_GP 2. BPF_MA_FREE_AFTER_single_RCU_GP
It probably is where the confusion is. I thought Hou's BPF_MA_REUSE_AFTER_RCU_GP
is already REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. From the commit
message:
" ... So introduce BPF_MA_REUSE_AFTER_RCU_GP to solve these problems. For
BPF_MA_REUSE_AFTER_GP, the freed objects are reused only after one RCU
grace period and may be returned back to slab system after another
RCU-tasks-trace grace period. ..."
[I assumed BPF_MA_REUSE_AFTER_GP is just a typo of BPF_MA_REUSE_AFTER_"RCU"_GP]
If I'm reading bpf_local_storage correctly it can remove reuse_now logic
in all conditions with REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace.
Right, for smap->bpf_ma == true (cgroup and task storage), all reuse_now logic
can be gone and directly use the bpf_mem_cache_free(). Potentially the sk/inode
can also move to bpf_ma after running some benchmark. This will simplify things
a lot. For sk storage, the reuse_now was there to avoid the unnecessary
tasks_trace gp because performance impact was reported on sk storage where
connections can be open-and-close very frequently.