On Fri, Jul 7, 2023 at 1:47 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote: > > On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > On 7/7/2023 12:16 PM, Alexei Starovoitov wrote: > > > > On Thu, Jul 6, 2023 at 8:39 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > >> Hi, > > > >> > > > >> On 7/7/2023 10:12 AM, Alexei Starovoitov wrote: > > > >>> On Thu, Jul 6, 2023 at 7:07 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > >>>> Hi, > > > >>>> > > > >>>> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote: > > > >>>> > > > SNIP > > > >>> and it's not just waiting_for_gp_ttrace. free_by_rcu_ttrace is similar. > > > >> I think free_by_rcu_ttrace is different, because the reuse is only > > > >> possible after one tasks trace RCU grace period as shown below, and the > > > >> concurrent llist_del_first() must have been completed when the head is > > > >> reused and re-added into free_by_rcu_ttrace again. > > > >> > > > >> // c0->free_by_rcu_ttrace > > > >> A -> B -> C -> nil > > > >> > > > >> P1: > > > >> alloc_bulk() > > > >> llist_del_first(&c->free_by_rcu_ttrace) > > > >> entry = A > > > >> next = B > > > >> > > > >> P2: > > > >> do_call_rcu_ttrace() > > > >> // c->free_by_rcu_ttrace->first = NULL > > > >> llist_del_all(&c->free_by_rcu_ttrace) > > > >> move to c->waiting_for_gp_ttrace > > > >> > > > >> P1: > > > >> llist_del_first() > > > >> return NULL > > > >> > > > >> // A is only reusable after one task trace RCU grace > > > >> // llist_del_first() must have been completed > > > > "must have been completed" ? > > > > > > > > I guess you're assuming that alloc_bulk() from irq_work > > > > is running within rcu_tasks_trace critical section, > > > > so __free_rcu_tasks_trace() callback will execute after > > > > irq work completed? > > > > I don't think that's the case. > > > > > > Yes. The following is my original thoughts. Correct me if I was wrong: > > > > > > 1. llist_del_first() must be running concurrently with llist_del_all(). > > > If llist_del_first() runs after llist_del_all(), it will return NULL > > > directly. > > > 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the > > > elements in free_by_rcu_ttrace will not be freed back to slab. > > > 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period > > > to call __free_rcu_tasks_trace() > > > 4. llist_del_first() in running in an context with irq-disabled, so the > > > tasks trace RCU grace period will wait for the end of llist_del_first() > > > > > > It seems you thought step 4) is not true, right ? > > > > Yes. I think so. For two reasons: > > > > 1. > > I believe irq disabled region isn't considered equivalent > > to rcu_read_lock_trace() region. > > > > Paul, > > could you clarify ? > > You are correct, Alexei. Unlike vanilla RCU, RCU Tasks Trace does not > count irq-disabled regions of code as readers. > > But why not just put an rcu_read_lock_trace() and a matching > rcu_read_unlock_trace() within that irq-disabled region of code? > > For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB, > Hou Tao would be correct from a strict current-implementation > viewpoint. The reason is that, given the current implementation in > CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or > take an IPI in order for the grace-period machinery to realize that this > task is done with all prior readers. > > However, we need to account for the possibility of IPI-free > implementations, for example, if the real-time guys decide to start > making heavy use of BPF sleepable programs. They would then insist on > getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels. At which > point, irq-disabled regions of code will absolutely not act as > RCU tasks trace readers. > > Again, why not just put an rcu_read_lock_trace() and a matching > rcu_read_unlock_trace() within that irq-disabled region of code? If I remember correctly, the general guidance is to always put an explicit marker if it is in an RCU-reader, instead of relying on implementation details. So the suggestion to put the marker instead of relying on IRQ disabling does align with that. Thanks.