Re: [PATCH bpf 1/2] bpf: Wait for busy refill_work when destorying bpf memory allocator

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Fri, 21 Oct 2022 09:06:08 +0800



Hi,

On 10/21/2022 1:49 AM, Stanislav Fomichev wrote:
> On Wed, Oct 19, 2022 at 6:08 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> On 10/20/2022 2:38 AM, sdf@xxxxxxxxxx wrote:
>>> On 10/19, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@xxxxxxxxxx>
SNIP
>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>>>> index 94f0f63443a6..48e606aaacf0 100644
>>>> --- a/kernel/bpf/memalloc.c
>>>> +++ b/kernel/bpf/memalloc.c
>>>> @@ -497,6 +497,16 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
>>>>           rcu_in_progress = 0;
>>>>           for_each_possible_cpu(cpu) {
>>>>               c = per_cpu_ptr(ma->cache, cpu);
>>>> +            /*
>>>> +             * refill_work may be unfinished for PREEMPT_RT kernel
>>>> +             * in which irq work is invoked in a per-CPU RT thread.
>>>> +             * It is also possible for kernel with
>>>> +             * arch_irq_work_has_interrupt() being false and irq
>>>> +             * work is inovked in timer interrupt. So wait for the
>>>> +             * completion of irq work to ease the handling of
>>>> +             * concurrency.
>>>> +             */
>>>> +            irq_work_sync(&c->refill_work);
>>> Does it make sense to guard these with "IS_ENABLED(CONFIG_PREEMPT_RT)" ?
>>> We do have a bunch of them sprinkled already to run alloc/free with
>>> irqs disabled.
>> No. As said in the commit message and the comments, irq_work_sync() is needed
>> for both PREEMPT_RT kernel and kernel with arch_irq_work_has_interrupt() being
>> false. And for other kernels, irq_work_sync() doesn't incur any overhead,
>> because it is  just a simple memory read through irq_work_is_busy() and nothing
>> else. The reason is the irq work must have been completed when invoking
>> bpf_mem_alloc_destroy() for these kernels.
>>
>> void irq_work_sync(struct irq_work *work)
>> {
>>        /* Remove code snippet for PREEMPT_RT and arch_irq_work_has_interrupt() */
>>         /* irq wor*/
>>         while (irq_work_is_busy(work))
>>                 cpu_relax();
>> }
> I see, thanks for clarifying! I was so carried away with that
> PREEMPT_RT that I missed the fact that arch_irq_work_has_interrupt is
> a separate thing. Agreed that doing irq_work_sync won't hurt in a
> non-preempt/non-has_interrupt case.
>
> In this case, can you still do a respin and fix the spelling issue in
> the comment? You can slap my acked-by for the v2:
>
> Acked-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
>
> s/work is inovked in timer interrupt. So wait for the/... invoked .../
Thanks. Will update the commit message and the comments in v2 to fix the typos
and add notes about the fact that there is no overhead under non-PREEMPT_RT and
arch_irq_work_hash_interrupt() kernel.
>
>>> I was also trying to see if adding local_irq_save inside drain_mem_cache
>>> to pair with the ones from refill might work, but waiting for irq to
>>> finish seems easier...
>> Disabling hard irq works, but irq_work_sync() is still needed to ensure it is
>> completed before freeing its memory.
>>> Maybe also move both of these in some new "static void irq_work_wait"
>>> to make it clear that the PREEMT_RT comment applies to both of them?
>>>
>>> Or maybe that helper should do 'for_each_possible_cpu(cpu)
>>> irq_work_sync(&c->refill_work);'
>>> in the PREEMPT_RT case so we don't have to call it twice?
>> drain_mem_cache() is also time consuming somethings, so I think it is better to
>> interleave irq_work_sync() and drain_mem_cache() to reduce waiting time.
>>
>>>>               drain_mem_cache(c);
>>>>               rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
>>>>           }
>>>> @@ -511,6 +521,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
>>>>               cc = per_cpu_ptr(ma->caches, cpu);
>>>>               for (i = 0; i < NUM_CACHES; i++) {
>>>>                   c = &cc->cache[i];
>>>> +                irq_work_sync(&c->refill_work);
>>>>                   drain_mem_cache(c);
>>>>                   rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
>>>>               }
>>>> --
>>>> 2.29.2
>>> .
> .