On Fri, Jan 22, 2021 at 2:05 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > On Thu, Jan 21, 2021 at 7:19 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 1/21/21 11:01 AM, Christoph Lameter wrote: > > > On Thu, 21 Jan 2021, Bharata B Rao wrote: > > > > > >> > The problem is that calculate_order() is called a number of times > > >> > before secondaries CPUs are booted and it returns 1 instead of 224. > > >> > This makes the use of num_online_cpus() irrelevant for those cases > > >> > > > >> > After adding in my command line "slub_min_objects=36" which equals to > > >> > 4 * (fls(num_online_cpus()) + 1) with a correct num_online_cpus == 224 > > >> > , the regression diseapears: > > >> > > > >> > 9 iterations of hackbench -l 16000 -g 16: 3.201sec (+/- 0.90%) > > > > I'm surprised that hackbench is that sensitive to slab performance, anyway. It's > > supposed to be a scheduler benchmark? What exactly is going on? > > Uuuh, I think powerpc doesn't have cmpxchg_double? > > "vgrep cmpxchg_double arch/" just spits out arm64, s390 and x86? And > <https://liblfds.org/mediawiki/index.php?title=Article:CAS_and_LL/SC_Implementation_Details_by_Processor_family> > says under "POWERPC": "no DW LL/SC" > > So powerpc is probably hitting the page-bitlock-based implementation > all the time for stuff like __slub_free()? Do you have detailed > profiling results from "perf top" or something like that? > > (I actually have some WIP patches and a design document for getting > rid of cmpxchg_double in struct page that I hacked together in the > last couple days; I'm currently in the process of sending them over to > some other folks in the company who hopefully have cycles to > review/polish/benchmark them so that they can be upstreamed, assuming > that those folks think they're important enough. I don't have the > cycles for it...) (The stuff I have in mind will only work on 64-bit though. We are talking about PPC64 here, right?)