Re: [PATCH v4 bpf-next 09/14] bpf: Allow reuse from waiting_for_gp_ttrace list.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote:
> On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On 7/7/2023 12:16 PM, Alexei Starovoitov wrote:
> > > On Thu, Jul 6, 2023 at 8:39 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> > >> Hi,
> > >>
> > >> On 7/7/2023 10:12 AM, Alexei Starovoitov wrote:
> > >>> On Thu, Jul 6, 2023 at 7:07 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> > >>>> Hi,
> > >>>>
> > >>>> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote:
> > >>>>
> > SNIP
> > >>> and it's not just waiting_for_gp_ttrace. free_by_rcu_ttrace is similar.
> > >> I think free_by_rcu_ttrace is different, because the reuse is only
> > >> possible after one tasks trace RCU grace period as shown below, and the
> > >> concurrent llist_del_first() must have been completed when the head is
> > >> reused and re-added into free_by_rcu_ttrace again.
> > >>
> > >> // c0->free_by_rcu_ttrace
> > >> A -> B -> C -> nil
> > >>
> > >> P1:
> > >> alloc_bulk()
> > >>     llist_del_first(&c->free_by_rcu_ttrace)
> > >>         entry = A
> > >>         next = B
> > >>
> > >> P2:
> > >> do_call_rcu_ttrace()
> > >>     // c->free_by_rcu_ttrace->first = NULL
> > >>     llist_del_all(&c->free_by_rcu_ttrace)
> > >>         move to c->waiting_for_gp_ttrace
> > >>
> > >> P1:
> > >> llist_del_first()
> > >>     return NULL
> > >>
> > >> // A is only reusable after one task trace RCU grace
> > >> // llist_del_first() must have been completed
> > > "must have been completed" ?
> > >
> > > I guess you're assuming that alloc_bulk() from irq_work
> > > is running within rcu_tasks_trace critical section,
> > > so __free_rcu_tasks_trace() callback will execute after
> > > irq work completed?
> > > I don't think that's the case.
> >
> > Yes. The following is my original thoughts. Correct me if I was wrong:
> >
> > 1. llist_del_first() must be running concurrently with llist_del_all().
> > If llist_del_first() runs after llist_del_all(), it will return NULL
> > directly.
> > 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the
> > elements in free_by_rcu_ttrace will not be freed back to slab.
> > 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period
> > to call __free_rcu_tasks_trace()
> > 4. llist_del_first() in running in an context with irq-disabled, so the
> > tasks trace RCU grace period will wait for the end of llist_del_first()
> >
> > It seems you thought step 4) is not true, right ?
> 
> Yes. I think so. For two reasons:
> 
> 1.
> I believe irq disabled region isn't considered equivalent
> to rcu_read_lock_trace() region.
> 
> Paul,
> could you clarify ?

You are correct, Alexei.  Unlike vanilla RCU, RCU Tasks Trace does not
count irq-disabled regions of code as readers.

But why not just put an rcu_read_lock_trace() and a matching
rcu_read_unlock_trace() within that irq-disabled region of code?

For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB,
Hou Tao would be correct from a strict current-implementation
viewpoint.  The reason is that, given the current implementation in
CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or
take an IPI in order for the grace-period machinery to realize that this
task is done with all prior readers.

However, we need to account for the possibility of IPI-free
implementations, for example, if the real-time guys decide to start
making heavy use of BPF sleepable programs.  They would then insist on
getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels.  At which
point, irq-disabled regions of code will absolutely not act as
RCU tasks trace readers.

Again, why not just put an rcu_read_lock_trace() and a matching
rcu_read_unlock_trace() within that irq-disabled region of code?

Or maybe there is a better workaround.

> 2.
> Even if 1 is incorrect, in RT llist_del_first() from alloc_bulk()
> runs "in a per-CPU thread in preemptible context."
> See irq_work_run_list.

Agreed, under RT, "interrupt handlers" often run in task context.

						Thanx, Paul




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux