On Thu, 20 Dec 2018, Andrew Morton wrote: > The result of (get_partial_count / get_partial_try_count): > > +----------+----------------+------------+-------------+ > | | Base | Patched | Improvement| > +----------+----------------+------------+-------------+ > |One Node | 1:3 | 1:0 | - 100% | If you have one node then you already searched all your slabs. So we could completely skip the get_any_partial() functionality in the non NUMA case (if nr_node_ids == 1) > +----------+----------------+------------+-------------+ > |Four Nodes| 1:5.8 | 1:2.5 | - 56% | > +----------+----------------+------------+-------------+ Hmm.... Ok but that is the extreme slowpath. > Each version/system configuration combination has four round kernel > build tests. Take the average result of real to compare. > > +----------+----------------+------------+-------------+ > | | Base | Patched | Improvement| > +----------+----------------+------------+-------------+ > |One Node | 4m41s | 4m32s | - 4.47% | > +----------+----------------+------------+-------------+ > |Four Nodes| 4m45s | 4m39s | - 2.92% | > +----------+----------------+------------+-------------+ 3% on the four node case? That means that the slowpath is taken frequently. Wonder why? Can we also see the variability? Since this is a NUMA system there is bound to be some indeterminism in those numbers.