On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote: > On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > On 7/7/2023 12:16 PM, Alexei Starovoitov wrote: > > > On Thu, Jul 6, 2023 at 8:39 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > >> Hi, > > >> > > >> On 7/7/2023 10:12 AM, Alexei Starovoitov wrote: > > >>> On Thu, Jul 6, 2023 at 7:07 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > >>>> Hi, > > >>>> > > >>>> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote: > > >>>> > > SNIP > > >>> and it's not just waiting_for_gp_ttrace. free_by_rcu_ttrace is similar. > > >> I think free_by_rcu_ttrace is different, because the reuse is only > > >> possible after one tasks trace RCU grace period as shown below, and the > > >> concurrent llist_del_first() must have been completed when the head is > > >> reused and re-added into free_by_rcu_ttrace again. > > >> > > >> // c0->free_by_rcu_ttrace > > >> A -> B -> C -> nil > > >> > > >> P1: > > >> alloc_bulk() > > >> llist_del_first(&c->free_by_rcu_ttrace) > > >> entry = A > > >> next = B > > >> > > >> P2: > > >> do_call_rcu_ttrace() > > >> // c->free_by_rcu_ttrace->first = NULL > > >> llist_del_all(&c->free_by_rcu_ttrace) > > >> move to c->waiting_for_gp_ttrace > > >> > > >> P1: > > >> llist_del_first() > > >> return NULL > > >> > > >> // A is only reusable after one task trace RCU grace > > >> // llist_del_first() must have been completed > > > "must have been completed" ? > > > > > > I guess you're assuming that alloc_bulk() from irq_work > > > is running within rcu_tasks_trace critical section, > > > so __free_rcu_tasks_trace() callback will execute after > > > irq work completed? > > > I don't think that's the case. > > > > Yes. The following is my original thoughts. Correct me if I was wrong: > > > > 1. llist_del_first() must be running concurrently with llist_del_all(). > > If llist_del_first() runs after llist_del_all(), it will return NULL > > directly. > > 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the > > elements in free_by_rcu_ttrace will not be freed back to slab. > > 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period > > to call __free_rcu_tasks_trace() > > 4. llist_del_first() in running in an context with irq-disabled, so the > > tasks trace RCU grace period will wait for the end of llist_del_first() > > > > It seems you thought step 4) is not true, right ? > > Yes. I think so. For two reasons: > > 1. > I believe irq disabled region isn't considered equivalent > to rcu_read_lock_trace() region. > > Paul, > could you clarify ? You are correct, Alexei. Unlike vanilla RCU, RCU Tasks Trace does not count irq-disabled regions of code as readers. But why not just put an rcu_read_lock_trace() and a matching rcu_read_unlock_trace() within that irq-disabled region of code? For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB, Hou Tao would be correct from a strict current-implementation viewpoint. The reason is that, given the current implementation in CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or take an IPI in order for the grace-period machinery to realize that this task is done with all prior readers. However, we need to account for the possibility of IPI-free implementations, for example, if the real-time guys decide to start making heavy use of BPF sleepable programs. They would then insist on getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels. At which point, irq-disabled regions of code will absolutely not act as RCU tasks trace readers. Again, why not just put an rcu_read_lock_trace() and a matching rcu_read_unlock_trace() within that irq-disabled region of code? Or maybe there is a better workaround. > 2. > Even if 1 is incorrect, in RT llist_del_first() from alloc_bulk() > runs "in a per-CPU thread in preemptible context." > See irq_work_run_list. Agreed, under RT, "interrupt handlers" often run in task context. Thanx, Paul