Hi, On 7/8/2023 1:47 AM, Paul E. McKenney wrote: > On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote: >> On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: SNIP >>> I guess you're assuming that alloc_bulk() from irq_work >>> is running within rcu_tasks_trace critical section, >>> so __free_rcu_tasks_trace() callback will execute after >>> irq work completed? >>> I don't think that's the case. >>> Yes. The following is my original thoughts. Correct me if I was wrong: >>> >>> 1. llist_del_first() must be running concurrently with llist_del_all(). >>> If llist_del_first() runs after llist_del_all(), it will return NULL >>> directly. >>> 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the >>> elements in free_by_rcu_ttrace will not be freed back to slab. >>> 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period >>> to call __free_rcu_tasks_trace() >>> 4. llist_del_first() in running in an context with irq-disabled, so the >>> tasks trace RCU grace period will wait for the end of llist_del_first() >>> >>> It seems you thought step 4) is not true, right ? >> Yes. I think so. For two reasons: >> >> 1. >> I believe irq disabled region isn't considered equivalent >> to rcu_read_lock_trace() region. >> >> Paul, >> could you clarify ? > You are correct, Alexei. Unlike vanilla RCU, RCU Tasks Trace does not > count irq-disabled regions of code as readers. I see. But I still have one question: considering that in current implementation one Tasks Trace RCU grace period implies one vanilla RCU grace period (aka rcu_trace_implies_rcu_gp), so in my naive understanding of RCU, does that mean __free_rcu_tasks_trace() will be invoked after the expiration of current Task Trace RCU grace period, right ? And does it also mean __free_rcu_tasks_trace() will be invoked after the expiration of current vanilla RCU grace period, right ? If these two conditions above are true, does it mean __free_rcu_tasks_trace() will wait for the irq-disabled code reigion ? > But why not just put an rcu_read_lock_trace() and a matching > rcu_read_unlock_trace() within that irq-disabled region of code? > > For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB, > Hou Tao would be correct from a strict current-implementation > viewpoint. The reason is that, given the current implementation in > CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or > take an IPI in order for the grace-period machinery to realize that this > task is done with all prior readers. Thanks for the detailed explanation. > However, we need to account for the possibility of IPI-free > implementations, for example, if the real-time guys decide to start > making heavy use of BPF sleepable programs. They would then insist on > getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels. At which > point, irq-disabled regions of code will absolutely not act as > RCU tasks trace readers. > > Again, why not just put an rcu_read_lock_trace() and a matching > rcu_read_unlock_trace() within that irq-disabled region of code? > > Or maybe there is a better workaround. Yes. I think we could use rcu_read_{lock,unlock}_trace to fix the ABA problem for free_by_rcu_ttrace. > >> 2. >> Even if 1 is incorrect, in RT llist_del_first() from alloc_bulk() >> runs "in a per-CPU thread in preemptible context." >> See irq_work_run_list. > Agreed, under RT, "interrupt handlers" often run in task context. Yes, I missed that. I misread alloc_bulk(), and it seems it only does inc_active() for c->free_llist. > Thanx, Paul