Hi, On 10/21/2022 1:49 AM, Stanislav Fomichev wrote: > On Wed, Oct 19, 2022 at 6:08 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> Hi, >> >> On 10/20/2022 2:38 AM, sdf@xxxxxxxxxx wrote: >>> On 10/19, Hou Tao wrote: >>>> From: Hou Tao <houtao1@xxxxxxxxxx> SNIP >>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c >>>> index 94f0f63443a6..48e606aaacf0 100644 >>>> --- a/kernel/bpf/memalloc.c >>>> +++ b/kernel/bpf/memalloc.c >>>> @@ -497,6 +497,16 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) >>>> rcu_in_progress = 0; >>>> for_each_possible_cpu(cpu) { >>>> c = per_cpu_ptr(ma->cache, cpu); >>>> + /* >>>> + * refill_work may be unfinished for PREEMPT_RT kernel >>>> + * in which irq work is invoked in a per-CPU RT thread. >>>> + * It is also possible for kernel with >>>> + * arch_irq_work_has_interrupt() being false and irq >>>> + * work is inovked in timer interrupt. So wait for the >>>> + * completion of irq work to ease the handling of >>>> + * concurrency. >>>> + */ >>>> + irq_work_sync(&c->refill_work); >>> Does it make sense to guard these with "IS_ENABLED(CONFIG_PREEMPT_RT)" ? >>> We do have a bunch of them sprinkled already to run alloc/free with >>> irqs disabled. >> No. As said in the commit message and the comments, irq_work_sync() is needed >> for both PREEMPT_RT kernel and kernel with arch_irq_work_has_interrupt() being >> false. And for other kernels, irq_work_sync() doesn't incur any overhead, >> because it is just a simple memory read through irq_work_is_busy() and nothing >> else. The reason is the irq work must have been completed when invoking >> bpf_mem_alloc_destroy() for these kernels. >> >> void irq_work_sync(struct irq_work *work) >> { >> /* Remove code snippet for PREEMPT_RT and arch_irq_work_has_interrupt() */ >> /* irq wor*/ >> while (irq_work_is_busy(work)) >> cpu_relax(); >> } > I see, thanks for clarifying! I was so carried away with that > PREEMPT_RT that I missed the fact that arch_irq_work_has_interrupt is > a separate thing. Agreed that doing irq_work_sync won't hurt in a > non-preempt/non-has_interrupt case. > > In this case, can you still do a respin and fix the spelling issue in > the comment? You can slap my acked-by for the v2: > > Acked-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > > s/work is inovked in timer interrupt. So wait for the/... invoked .../ Thanks. Will update the commit message and the comments in v2 to fix the typos and add notes about the fact that there is no overhead under non-PREEMPT_RT and arch_irq_work_hash_interrupt() kernel. > >>> I was also trying to see if adding local_irq_save inside drain_mem_cache >>> to pair with the ones from refill might work, but waiting for irq to >>> finish seems easier... >> Disabling hard irq works, but irq_work_sync() is still needed to ensure it is >> completed before freeing its memory. >>> Maybe also move both of these in some new "static void irq_work_wait" >>> to make it clear that the PREEMT_RT comment applies to both of them? >>> >>> Or maybe that helper should do 'for_each_possible_cpu(cpu) >>> irq_work_sync(&c->refill_work);' >>> in the PREEMPT_RT case so we don't have to call it twice? >> drain_mem_cache() is also time consuming somethings, so I think it is better to >> interleave irq_work_sync() and drain_mem_cache() to reduce waiting time. >> >>>> drain_mem_cache(c); >>>> rcu_in_progress += atomic_read(&c->call_rcu_in_progress); >>>> } >>>> @@ -511,6 +521,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) >>>> cc = per_cpu_ptr(ma->caches, cpu); >>>> for (i = 0; i < NUM_CACHES; i++) { >>>> c = &cc->cache[i]; >>>> + irq_work_sync(&c->refill_work); >>>> drain_mem_cache(c); >>>> rcu_in_progress += atomic_read(&c->call_rcu_in_progress); >>>> } >>>> -- >>>> 2.29.2 >>> . > .