Hi Paul, On 11/10/2023 9:45 AM, Paul E. McKenney wrote: > On Fri, Nov 10, 2023 at 09:06:56AM +0800, Hou Tao wrote: >> Hi, >> >> On 11/10/2023 3:55 AM, Paul E. McKenney wrote: >>> On Thu, Nov 09, 2023 at 07:55:50AM -0800, Alexei Starovoitov wrote: >>>> On Wed, Nov 8, 2023 at 11:26 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> On 11/9/2023 2:36 PM, Martin KaFai Lau wrote: >>>>>> On 11/7/23 6:06 AM, Hou Tao wrote: >>>>>>> From: Hou Tao <houtao1@xxxxxxxxxx> >>>>>>> >>>>>>> bpf_map_of_map_fd_get_ptr() will convert the map fd to the pointer >>>>>>> saved in map-in-map. bpf_map_of_map_fd_put_ptr() will release the >>>>>>> pointer saved in map-in-map. These two helpers will be used by the >>>>>>> following patches to fix the use-after-free problems for map-in-map. >>>>>>> >>>>>>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> >>>>>>> --- >>>>>>> kernel/bpf/map_in_map.c | 51 +++++++++++++++++++++++++++++++++++++++++ >>>>>>> kernel/bpf/map_in_map.h | 11 +++++++-- >>>>>>> 2 files changed, 60 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> >>>>> SNIP >>>>>>> +void bpf_map_of_map_fd_put_ptr(void *ptr, bool need_defer) >>>>>>> +{ >>>>>>> + struct bpf_inner_map_element *element = ptr; >>>>>>> + >>>>>>> + /* Do bpf_map_put() after a RCU grace period and a tasks trace >>>>>>> + * RCU grace period, so it is certain that the bpf program which is >>>>>>> + * manipulating the map now has exited when bpf_map_put() is >>>>>>> called. >>>>>>> + */ >>>>>>> + if (need_defer) >>>>>> "need_defer" should only happen from the syscall cmd? Instead of >>>>>> adding rcu_head to each element, how about >>>>>> "synchronize_rcu_mult(call_rcu, call_rcu_tasks)" here? >>>>> No. I have tried the method before, but it didn't work due to dead-lock >>>>> (will mention that in commit message in v2). The reason is that bpf >>>>> syscall program may also do map update through sys_bpf helper. Because >>>>> bpf syscall program is running with sleep-able context and has >>>>> rcu_read_lock_trace being held, so call synchronize_rcu_mult(call_rcu, >>>>> call_rcu_tasks) will lead to dead-lock. >>>> Dead-lock? why? >>>> >>>> I think it's legal to do call_rcu_tasks_trace() while inside RCU CS >>>> or RCU tasks trace CS. >>> Just confirming that this is the case. If invoking call_rcu_tasks_trace() >>> within under either rcu_read_lock() or rcu_read_lock_trace() deadlocks, >>> then there is a bug that needs fixing. ;-) >> The case for dead-lock is that calling synchronize_rcu_mult(call_rcu, >> call_rcu_tasks_trace) within under rcu_read_lock_trace() and I think it >> is expected. The case that calling call_rcu_tasks_trace() with >> rcu_read_lock_trace() being held is OK. > Very good, you are quite right. In this particular case, deadlock is > expected behavior. > > The problem here is that synchronize_rcu_mult() doesn't just invoke its > arguments, instead, it also waits for all of the corresponding grace > periods to complete. But if you call this while under the protection of > rcu_read_lock_trace(), then synchronize_rcu_mult(call_rcu_tasks_trace) > cannot return until the corresponding rcu_read_unlock_trace() is > reached, but that rcu_read_unlock_trace() cannot be reached until after > synchronize_rcu_mult(call_rcu_tasks_trace) returns. > > (I did leave out the call_rcu argument because it does not participate > in this particular deadlock.) Got it. Thanks for the detailed explanation. > > Thanx, Paul > > .