Nick Piggin wrote: > On Wednesday 27 August 2008 06:01, Mike Travis wrote: >> Dave Jones wrote: >> ... >> >>> But yes, for this to be even remotely feasible, there has to be a >>> negligable performance cost associated with it, which right now, we >>> clearly don't have. Given that the number of people running 4096 CPU >>> boxes even in a few years time will still be tiny, punishing the common >>> case is obviously absurd. >>> >>> Dave >> I did do some fairly extensive benchmarking between configs of NR_CPUS = >> 128 and 4096 and most performance hits were in the neighborhood of < 5% on >> systems with 8 cpus and 4GB of memory (our most common test system). > > 5% is a pretty nasty performance hit... what sort of benchmarks are we > talking about here? It's been a while now, I should go back and check my notes. Many of the BM's did not have any changes. I believe the ones that were right on the edge of paging were affected by the fact that less memory was available. > > I just made some pretty crazy changes to the VM to get "only" around 5 > or so % performance improvement in some workloads. > > What places are making heavy use of cpumasks that causes such a slowdown? > Hopefully callers can mostly be improved so they don't need to use cpumasks > for common cases. That's another study I did, and it seemed that maybe 95% of the functions would not be affected by passing pointers to cpumasks instead of the cpumasks themselves, because the data was processed by a cpu_xxx function that uses a pointer. Most commonly was to create a temp cpumask, using cpus_and(temp_mask, callers_mask, cpu_online_map); The speedup to use nr_cpu_ids instead of NR_CPUS in the traversal functions helped quite a bit. Using this same method in the cpus_xxx functions would further speed up things. (As well as only allocating the cpumask sized by nr_cpu_ids instead of NR_CPUS as the current cpumask_t definition specifies.) > > Until then, it would be kind of sad for a distro to ship a generic x86 > kernel and lose 5% performance because it is set to 4096 CPUs... > > But if I misunderstand and you're talking about specific microbenchmarks to > find the worst case for huge cpumasks, then I take that back. Yes, I was (at the time) trying to determine how many of the cpumask functions were actually in play by user tasks, so I was zeroing in on those (cpusets, rescheds, etc.) > > >> [But >> changing cpumask_t's to be pointers instead of values will likely increase >> this.] I've tried to be very sensitive to this issue with all my previous >> changes, so convincing the distros to set NR_CPUS=4096 would be as painless >> for them as possible. ;-) >> >> Btw, huge count cpu systems I don't think are that far away. I believe the >> nextgen Larabbee chips will be geared towards HPC applications [instead of >> just GFX apps], and putting 4 of these chips on a motherboard would add up >> to 512 cpu threads (1024 if they support hyperthreading.) > > It would be quite interesting if they make them cache coherent / MP capable. > Will they be? There's not been a lot of info available yet, but I think the 128 cores will share at least an L2 cache + memory controller. How the APIC's interact is also another big question. And most likely some standard system controller CPU will be needed, but that could be a tiny VIA processor... ;-) Thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html