On Mon, Aug 19, 2024 at 4:02 PM Yongqiang Liu <liuyongqiang13@xxxxxxxxxx> wrote: > > commit 0ad9500e16fe ("slub: prefetch next freelist pointer in > slab_alloc()") introduced prefetch_freepointer() for fastpath > allocation. Use it at the freelist firt load could have a bit > improvement in some workloads. Here is hackbench results at > arm64 machine(about 3.8%): > > Before: > average time cost of 'hackbench -g 100 -l 1000': 17.068 > > Afther: > average time cost of 'hackbench -g 100 -l 1000': 16.416 > > There is also having about 5% improvement at x86_64 machine > for hackbench. I think adding more prefetch might not be a good idea unless we have more real-world data supporting it because prefetch might help when slab is frequently used, but it will end up unnecessarily using more cache lines when slab is not frequently used. Also I don't understand how adding prefetch in slowpath affects the performance because most allocs/frees should be done in the fastpath. Could you please explain? > Signed-off-by: Yongqiang Liu <liuyongqiang13@xxxxxxxxxx> > --- > mm/slub.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/slub.c b/mm/slub.c > index c9d8a2497fd6..f9daaff10c6a 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3630,6 +3630,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > VM_BUG_ON(!c->slab->frozen); > c->freelist = get_freepointer(s, freelist); > c->tid = next_tid(c->tid); > + prefetch_freepointer(s, c->freelist); > local_unlock_irqrestore(&s->cpu_slab->lock, flags); > return freelist; > > -- > 2.25.1 >