Re: [PATCH] mm: memcg: provide accurate stats for userspace reads

Tejun Heo <tj@xxxxxxxxxx> · Mon, 14 Aug 2023 14:35:25 -1000

Hello,

On Mon, Aug 14, 2023 at 05:28:22PM -0700, Yosry Ahmed wrote:
> > So, the original design used mutex for synchronize flushing with the idea
> > being that updates are high freq but reads are low freq and can be
> > relatively slow. Using rstats for mm internal operations changed this
> > assumption quite a bit and we ended up switching that mutex with a lock.
> 
> Naive question, do mutexes handle thundering herd problems better than
> spinlocks? I would assume so but I am not sure.

I don't know. We can ask Waiman if that becomes a problem.

> > * Flush-side, maybe we can break flushing into per-cpu or whatnot but
> >   there's no avoiding the fact that flushing can take quite a while if there
> >   are a lot to flush whether locks are split or not. I wonder whether it'd
> >   be possible to go back to mutex for flushing and update the users to
> >   either consume the cached values or operate in a sleepable context if
> >   synchronous read is necessary, which is the right thing to do anyway given
> >   how long flushes can take.
> 
> Unfortunately it cannot be broken down into per-cpu as all flushers
> update the same per-cgroup counters, so we need a bigger locking
> scope. Switching to atomics really hurts performance. Breaking down
> the lock to be per-cgroup is doable, but since we need to lock both
> the parent and the cgroup, flushing top-level cgroups (which I assume
> is most common) will lock the root anyway.

Plus, there's not much point in flushing in parallel, so I don't feel too
enthusiastic about splitting flush locking.

> All flushers right now operate in sleepable context, so we can go
> again to the mutex if you think this will make things better. The

Yes, I think that'd be more sane.

> slowness problem reported recently is in a sleepable context, it's
> just too slow for userspace if I understand correctly.

I mean, there's a certain amount of work to do. There's no way around it if
you wanna read the counters synchronously. The only solution there would be
using a cached value or having some sort of auto-flushing mechanism so that
the amount to flush don't build up too much - e.g. keep a count of the
number of entries to flush and trigger flush if it goes over some threshold.

Thanks.

-- 
tejun