Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again

Roman Gushchin <roman.gushchin@xxxxxxxxx> · Wed, 17 Jan 2024 13:50:22 -0800

On Wed, Jan 17, 2024 at 12:20:59PM -0800, Linus Torvalds wrote:
> On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >
> > That's a good point.  If the microbenchmark isn't likely to be even
> > remotely realistic, maybe we should just revert the revert until if/when
> > somebody shows a real world impact.
> >
> > Linus, any objections to that?
> 
> We use SLAB_ACCOUNT for much more common allocations like queued
> signals, so I would tend to agree with Jeff that it's probably just
> some not very interesting microbenchmark that shows any file locking
> effects from SLAB_ALLOC, not any real use.
> 
> That said, those benchmarks do matter. It's very easy to say "not
> relevant in the big picture" and then the end result is that
> everything is a bit of a pig.
> 
> And the regression was absolutely *ENORMOUS*. We're not talking "a few
> percent". We're talking a 33% regression that caused the revert:
> 
>    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
> 
> I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> single allocation, it would be much nicer to account at a bigger
> granularity, possibly by having per-thread counters first before
> falling back to the obj_cgroup_charge. Whatever.
> 
> It's kind of stupid to have a benchmark that just allocates and
> deallocates a file lock in quick succession spend lots of time
> incrementing and decrementing cgroup charges for that repeated
> alloc/free.
> 
> However, that problem with SLAB_ACCOUNT is not the fault of file
> locking, but more of a slab issue.
> 
> End result: I think we should bring in Vlastimil and whoever else is
> doing SLAB_ACCOUNT things, and have them look at that side.
> 
> And then just enable SLAB_ACCOUNT for file locks. But very much look
> at silly costs in SLAB_ACCOUNT first, at least for trivial
> "alloc/free" patterns..
> 
> Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> thing?

Probably me.

I recently did some work on improving the kmem accounting performance,
which is mentioned in this thread and shaves off about 30%:
https://lore.kernel.org/lkml/20231019225346.1822282-1-roman.gushchin@xxxxxxxxx/

Overall the SLAB_ACCOUNT overhead looks big on micro-benchmarks simple because
SLAB allocation path is really fast, so even touching a per-cpu variable adds
a noticeable overhead. There is nothing particularly slow on the kmem allocation
and release paths, but saving a memcg/objcg pointer, bumping the charge
and stats adds up, even though we have batching in place.

I believe the only real way to make it significantly faster is to cache
pre-charged slab objects, but it adds to the complexity and increases the memory
footprint. So far it was all about micro-benchmarks, I haven't seen any
complaints on the performance of real workloads.

Thanks!