On Fri, 2023-08-11 at 03:06 +0900, Hyeonggon Yoo wrote: > On Thu, Aug 10, 2023 at 7:56 PM Jay Patel <jaypatel@xxxxxxxxxxxxx> > wrote: > > On Mon, 2023-07-24 at 04:09 +0900, Hyeonggon Yoo wrote: > > > Hello folks, > > > > > > This series is motivated by kernel test bot report [1] on Jay's > > > patch > > > that modifies slab order. While the patch was not merged and not > > > in > > > the > > > final form, I think it was a good lesson that changing slab order > > > has > > > more > > > impacts on performance than we expected. > > > > > > While inspecting the report, I found some potential points to > > > improve > > > SLUB. [2] It's _potential_ because it shows no improvements on > > > hackbench. > > > but I believe more realistic workloads would benefit from this. > > > Due > > > to > > > lack of resources and lack of my understanding of *realistic* > > > workloads, > > > I am asking you to help evaluating this together. > > > > Hi Hyeonggon, > > I tried hackbench test on Powerpc machine with 16 cpus but > > got ~32% of Regression with patch. > > Thank you so much for measuring this! That's very helpful. > It's interesting because on an AMD machine with 2 NUMA nodes there > was > not much difference. > > Does it have more than one socket? I have tested on single socket system. > > Could you confirm if the offending patch is patch 1 or 2? > If the offending one is patch 2, can you please check how large is L3 > cache miss rate > during hackbench? > Below regression is cause by Patch 1 "Revert mm, slub: change percpu partial accounting from objects to pages" Thanks Jay Patel > > Results as > > > > +-------+----+---------+------------+------------+ > > > | | Normal | With Patch | | > > +-------+----+---------+------------+------------+ > > > Amean | 1 | 1.3700 | 2.0353 | ( -32.69%) | > > > Amean | 4 | 5.1663 | 7.6563 | (- 32.52%) | > > > Amean | 7 | 8.9180 | 13.3353 | ( -33.13%) | > > > Amean | 12 | 15.4290 | 23.0757 | ( -33.14%) | > > > Amean | 21 | 27.3333 | 40.7823 | ( -32.98%) | > > > Amean | 30 | 38.7677 | 58.5300 | ( -33.76%) | > > > Amean | 48 | 62.2987 | 92.9850 | ( -33.00%) | > > > Amean | 64 | 82.8993 | 123.4717 | ( -32.86%) | > > +-------+----+---------+------------+------------+ > > > > Thanks > > Jay Patel > > > It only consists of two patches. Patch #1 addresses inaccuracy in > > > SLUB's heuristic, which can negatively affect workloads' > > > performance > > > when large folios are not available from buddy. > > > > > > Patch #2 changes SLUB's behavior when there are no slabs > > > available on > > > the > > > local node's partial slab list, increasing NUMA locality when > > > there > > > are > > > available memory (without reclamation) on the local node from > > > buddy. > > > > > > This is early state, but I think it's a good enough to start > > > discussion. > > > Any feedbacks and ideas are welcome. Thank you in advance! > > > > > > Hyeonggon > > > > > > https://lore.kernel.org/linux-mm/202307172140.3b34825a-oliver.sang@xxxxxxxxx > > > [1] > > > https://lore.kernel.org/linux-mm/CAB=+i9S6Ykp90+4N1kCE=hiTJTE4wzJDi8k5pBjjO_3sf0aeqg@xxxxxxxxxxxxxx > > > [2] > > > > > > Hyeonggon Yoo (2): > > > Revert "mm, slub: change percpu partial accounting from objects > > > to > > > pages" > > > mm/slub: prefer NUMA locality over slight memory saving on NUMA > > > machines > > > > > > include/linux/slub_def.h | 2 -- > > > mm/slab.h | 6 ++++ > > > mm/slub.c | 76 ++++++++++++++++++++++++++------ > > > ---- > > > ---- > > > 3 files changed, 55 insertions(+), 29 deletions(-) > > >