Re: [PATCH bpf-next v9 2/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation

Vlastimil Babka <vbabka@xxxxxxx> · Thu, 13 Mar 2025 15:21:48 +0100

On 3/13/25 09:44, Michal Hocko wrote:
> On Wed 12-03-25 12:06:10, Shakeel Butt wrote:
>> On Wed, Mar 12, 2025 at 11:00:20AM +0100, Vlastimil Babka wrote:
>> [...]
>> > 
>> > But if we can achieve the same without such reserved objects, I think it's
>> > even better. Performance and maintainability doesn't need to necessarily
>> > suffer. Maybe it can even improve in the process. E.g. if we build upon
>> > patches 1+4 and swith memcg stock locking to the non-irqsave variant, we
>> > should avoid some overhead there (something similar was tried there in the
>> > past but reverted when making it RT compatible).
>> 
>> In hindsight that revert was the bad decision. We accepted so much
>> complexity in memcg code for RT without questioning about a real world
>> use-case. Are there really RT users who want memcg or are using memcg? I
>> can not think of some RT user fine with memcg limits enforcement
>> (reclaim and throttling).
> 
> I do not think that there is any reasonable RT workload that would use
> memcg limits or other memcg features. On the other hand it is not
> unusual to have RT and non-RT workloads mixed on the same machine. They
> usually use some sort of CPU isolation to prevent from CPU contention
> but that doesn't help much if there are other resources they need to
> contend for (like shared locks). 
> 
>> I am on the path to bypass per-cpu memcg stocks for RT kernels.
> 
> That would cause regressions for non-RT tasks running on PREEMPT_RT
> kernels, right?

For the context, this is about commit 559271146efc ("mm/memcg: optimize user
context object stock access")

reverted in fead2b869764 ("mm/memcg: revert ("mm/memcg: optimize user
context object stock access")")

I think at this point we don't have to recreate the full approach of the
first commit and introduce separate in_task() and in-interrupt stocks again.

The localtry_lock itself should make it possible to avoid the
irqsave/restore overhead (which was the main performance benefit of
559271146efc [1]) and only end up bypassing the stock when an allocation
from irq context actually interrupts an allocation from task context - which
would be very rare. And it should be already RT compatible. Let me see how
hard it would be on top of patch 4/6 "memcg: Use trylock to access memcg
stock_lock" to switch to the variant without _irqsave...

[1] the revert cites benchmarks that irqsave/restore can be actually cheaper
than preempt disable/enable, but I believe those were flawed