Re: [RFC PATCH] mm, slub: change percpu partial accounting from objects to pages

Vlastimil Babka <vbabka@xxxxxxx> · Wed, 15 Sep 2021 10:42:06 +0200

On 9/15/21 07:32, David Rientjes wrote:
> On Mon, 13 Sep 2021, Vlastimil Babka wrote:
> 
>> While this is no longer a problem in kmemcg context thanks to the accounting
>> rewrite in 5.9, the memory waste is still not ideal and it's questionable
>> whether it makes sense to perform free object count based control when object
>> counts can easily become so much inaccurate. So this patch converts the
>> accounting to be based on number of pages only (which is precise) and removes
>> the page->pobjects field completely. This is also ultimately simpler.
>> 
> 
> Thanks for the very detailed explanation, this is very timely for us.
> 
> I'm wondering if we should be concerned about the memory waste even being 
> possible, though, now that we have the kmemcg accounting change?
> 
> IIUC, because we're accounting objects and not pages, then it *seems* like 
> we could have a high number of pages but very few objects charged per 
> page so this memory waste could go unconstrained from any kmemcg 
> limitation.

So the main problem before 5.9 was that there were separate kmem caches per
memcg with their own percpu partial lists, so the memory used was determined
by caches x cpus x memcgs, now they are shared so it's just caches x cpus.
What you're saying would be also true, but relatively much smaller issue
than what it was before 5.9.

>> To retain the existing set_cpu_partial() heuristic, first calculate the target
>> number of objects as previously, but then convert it to target number of pages
>> by assuming the pages will be half-filled on average. This assumption might
>> obviously also be inaccurate in practice, but cannot degrade to actual number of
>> pages being equal to the target number of objects.
>> 
> 
> I think that's a fair heuristic.
> 
>> We could also skip the intermediate step with target number of objects and
>> rewrite the heuristic in terms of pages. However we still have the sysfs file
>> cpu_partial which uses number of objects and could break existing users if it
>> suddenly becomes number of pages, so this patch doesn't do that.
>> 
>> In practice, after this patch the heuristics limit the size of percpu partial
>> list up to 2 pages. In case of a reported regression (which would mean some
>> workload has benefited from the previous imprecise object based counting), we
>> can tune the heuristics to get a better compromise within the new scheme, while
>> still avoid the unexpectedly long percpu partial lists.
>> 
> 
> Curious if you've tried netperf TCP_RR with this change?  This benchmark 
> was the most significantly improved benchmark that I recall with the 
> introduction of per-cpu partial slabs for SLUB.  If there are any 
> regressions to be introduced by such an approach, I'm willing to bet that 
> it would be surfaced with that benchmark.

I'll try, thanks for the tip.