I was more thinking about the number of slab pages in the partial caches rather than the size of the objects itself being an issue. I believe that was /sys/kernel/slab/*/cpu_partial.
That setting could be tuned further before merging. An increase there causes additional memory to be caught in the partial list. But it reduces the node lock pressure further.
On Tue, Sep 13, 2011 at 3:29 AM, Alex,Shi <alex.shi@xxxxxxxxx> wrote:
> > Hmmm... The sizes of the per cpu partial objects could be varied a bit to
> > see if more would make an impact.
>
>
> I find almost in one time my kbuilding.
> size 384, was alloced in fastpath about 2900k times
> size 176, was alloced in fastpath about 1900k times
> size 192, was alloced in fastpath about 500k times
> anon_vma, was alloced in fastpath about 560k times
> size 72, was alloced in fastpath about 600k times
> size 512, 256, 128, was alloced in fastpath about more than 100k for
> each of them.
>
> I may give you objects size involved in my netperf testing later.
> and which test case do you prefer to? If I have, I may collection data
> on them.
I write a short script to collect different size object usage of
alloc_fastpath. The output is following, first column is the object
name and second is the alloc_fastpath called times.
:t-0000448 62693419
:t-0000384 1037746
:at-0000104 191787
:t-0000176 2051053
anon_vma 953578
:t-0000048 2108191
:t-0008192 17858636
:t-0004096 2307039
:t-0002048 21601441
:t-0001024 98409238
:t-0000512 14896189
:t-0000256 96731409
:t-0000128 221045
:t-0000064 149505
:t-0000032 638431
:t-0000192 263488
-----
Above output shows size 448/8192/2048/512/256 are used much.
So at least both kbuild(with 4 jobs) and netperf loopback (one server on
CPU socket 1, and one client on CPU socket 2) testing have no clear
performance change on our machine
NHM-EP/NHM-EX/WSM-EP/tigerton/core2-EP.