Re: [PATCH bpf-next v9 2/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation

Michal Hocko <mhocko@xxxxxxxx> · Fri, 14 Mar 2025 11:16:22 +0100

On Thu 13-03-25 15:21:48, Vlastimil Babka wrote:
> On 3/13/25 09:44, Michal Hocko wrote:
> > On Wed 12-03-25 12:06:10, Shakeel Butt wrote:
> >> On Wed, Mar 12, 2025 at 11:00:20AM +0100, Vlastimil Babka wrote:
> >> [...]
> >> > 
> >> > But if we can achieve the same without such reserved objects, I think it's
> >> > even better. Performance and maintainability doesn't need to necessarily
> >> > suffer. Maybe it can even improve in the process. E.g. if we build upon
> >> > patches 1+4 and swith memcg stock locking to the non-irqsave variant, we
> >> > should avoid some overhead there (something similar was tried there in the
> >> > past but reverted when making it RT compatible).
> >> 
> >> In hindsight that revert was the bad decision. We accepted so much
> >> complexity in memcg code for RT without questioning about a real world
> >> use-case. Are there really RT users who want memcg or are using memcg? I
> >> can not think of some RT user fine with memcg limits enforcement
> >> (reclaim and throttling).
> > 
> > I do not think that there is any reasonable RT workload that would use
> > memcg limits or other memcg features. On the other hand it is not
> > unusual to have RT and non-RT workloads mixed on the same machine. They
> > usually use some sort of CPU isolation to prevent from CPU contention
> > but that doesn't help much if there are other resources they need to
> > contend for (like shared locks). 
> > 
> >> I am on the path to bypass per-cpu memcg stocks for RT kernels.
> > 
> > That would cause regressions for non-RT tasks running on PREEMPT_RT
> > kernels, right?
> 
> For the context, this is about commit 559271146efc ("mm/memcg: optimize user
> context object stock access")
> 
> reverted in fead2b869764 ("mm/memcg: revert ("mm/memcg: optimize user
> context object stock access")")
> 
> I think at this point we don't have to recreate the full approach of the
> first commit and introduce separate in_task() and in-interrupt stocks again.
> 
> The localtry_lock itself should make it possible to avoid the
> irqsave/restore overhead (which was the main performance benefit of
> 559271146efc [1]) and only end up bypassing the stock when an allocation
> from irq context actually interrupts an allocation from task context - which
> would be very rare. And it should be already RT compatible. Let me see how
> hard it would be on top of patch 4/6 "memcg: Use trylock to access memcg
> stock_lock" to switch to the variant without _irqsave...

makes sense.

> [1] the revert cites benchmarks that irqsave/restore can be actually cheaper
> than preempt disable/enable, but I believe those were flawed

-- 
Michal Hocko
SUSE Labs