Re: [PATCH v4 bpf-next 09/14] bpf: Allow reuse from waiting_for_gp_ttrace list.

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Sat, 8 Jul 2023 15:03:40 +0800



Hi,

On 7/8/2023 1:47 AM, Paul E. McKenney wrote:
> On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote:
>> On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
SNIP
>>> I guess you're assuming that alloc_bulk() from irq_work
>>> is running within rcu_tasks_trace critical section,
>>> so __free_rcu_tasks_trace() callback will execute after
>>> irq work completed?
>>> I don't think that's the case.
>>> Yes. The following is my original thoughts. Correct me if I was wrong:
>>>
>>> 1. llist_del_first() must be running concurrently with llist_del_all().
>>> If llist_del_first() runs after llist_del_all(), it will return NULL
>>> directly.
>>> 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the
>>> elements in free_by_rcu_ttrace will not be freed back to slab.
>>> 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period
>>> to call __free_rcu_tasks_trace()
>>> 4. llist_del_first() in running in an context with irq-disabled, so the
>>> tasks trace RCU grace period will wait for the end of llist_del_first()
>>>
>>> It seems you thought step 4) is not true, right ?
>> Yes. I think so. For two reasons:
>>
>> 1.
>> I believe irq disabled region isn't considered equivalent
>> to rcu_read_lock_trace() region.
>>
>> Paul,
>> could you clarify ?
> You are correct, Alexei.  Unlike vanilla RCU, RCU Tasks Trace does not
> count irq-disabled regions of code as readers.

I see. But I still have one question: considering that in current
implementation one Tasks Trace RCU grace period implies one vanilla RCU
grace period (aka rcu_trace_implies_rcu_gp), so in my naive
understanding of RCU, does that mean __free_rcu_tasks_trace() will be
invoked after the expiration of current Task Trace RCU grace period,
right ? And does it also mean __free_rcu_tasks_trace() will be invoked
after the expiration of current vanilla RCU grace period, right ? If
these two conditions above are true, does it mean
__free_rcu_tasks_trace() will wait for the irq-disabled code reigion ?
> But why not just put an rcu_read_lock_trace() and a matching
> rcu_read_unlock_trace() within that irq-disabled region of code?
>
> For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB,
> Hou Tao would be correct from a strict current-implementation
> viewpoint.  The reason is that, given the current implementation in
> CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or
> take an IPI in order for the grace-period machinery to realize that this
> task is done with all prior readers.

Thanks for the detailed explanation.
> However, we need to account for the possibility of IPI-free
> implementations, for example, if the real-time guys decide to start
> making heavy use of BPF sleepable programs.  They would then insist on
> getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels.  At which
> point, irq-disabled regions of code will absolutely not act as
> RCU tasks trace readers.
>
> Again, why not just put an rcu_read_lock_trace() and a matching
> rcu_read_unlock_trace() within that irq-disabled region of code?
>
> Or maybe there is a better workaround.

Yes. I think we could use rcu_read_{lock,unlock}_trace to fix the ABA
problem for free_by_rcu_ttrace.
>
>> 2.
>> Even if 1 is incorrect, in RT llist_del_first() from alloc_bulk()
>> runs "in a per-CPU thread in preemptible context."
>> See irq_work_run_list.
> Agreed, under RT, "interrupt handlers" often run in task context.

Yes, I missed that. I misread alloc_bulk(), and it seems it only does
inc_active() for c->free_llist.
> 						Thanx, Paul