On Tue 26-01-21 14:38:14, Vincent Guittot wrote: > On Tue, 26 Jan 2021 at 09:52, Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Thu 21-01-21 19:19:21, Vlastimil Babka wrote: > > [...] > > > We could also start questioning the very assumption that number of cpus should > > > affect slab page size in the first place. Should it? After all, each CPU will > > > have one or more slab pages privately cached, as we discuss in the other > > > thread... So why make the slab pages also larger? > > > > I do agree. What is the acutal justification for this scaling? > > /* > > * Attempt to find best configuration for a slab. This > > * works by first attempting to generate a layout with > > * the best configuration and backing off gradually. > > * > > * First we increase the acceptable waste in a slab. Then > > * we reduce the minimum objects required in a slab. > > */ > > > > doesn't speak about CPUs. 9b2cd506e5f2 ("slub: Calculate min_objects > > based on number of processors.") does talk about hackbench "This has > > been shown to address the performance issues in hackbench on 16p etc." > > but it doesn't give any more details to tell actually _why_ that works. > > > > This thread shows that this is still somehow related to performance but > > the real reason is not clear. I believe we should be focusing on the > > actual reasons for the performance impact than playing with some fancy > > math and tuning for a benchmark on a particular machine which doesn't > > work for others due to subtle initialization timing issues. > > > > Fundamentally why should higher number of CPUs imply the size of slab in > > the first place? > > A 1st answer is that the activity and the number of threads involved > scales with the number of CPUs. Regarding the hackbench benchmark as > an example, the number of group/threads raise to a higher level on the > server than on the small system which doesn't seem unreasonable. > > On 8 CPUs, I run hackbench with up to 16 groups which means 16*40 > threads. But I raise up to 256 groups, which means 256*40 threads, on > the 224 CPUs system. In fact, hackbench -g 1 (with 1 group) doesn't > regress on the 224 CPUs system. The next test with 4 groups starts > to regress by -7%. But the next one: hackbench -g 16 regresses by 187% > (duration is almost 3 times longer). It seems reasonable to assume > that the number of running threads and resources scale with the number > of CPUs because we want to run more stuff. OK, I do understand that more jobs scale with the number of CPUs but I would also expect that higher order pages are generally more expensive to get so this is not really a clear cut especially under some more demand on the memory where allocations are smooth. So the question really is whether this is not just optimizing for artificial conditions. -- Michal Hocko SUSE Labs