(I tried to include all arch maintianers and memory maintainers, if I missed someone, please let me know). OK, as promised, I've got a working VM percpu solution. A little history: please read the thread http://lkml.org/lkml/2006/4/14/137 Preivously I noticed that the per_cpu variables were implemented with a little hack that Rusty Russell wrote up. I call it a hack because, for modules an arbitrary number was used to hold all per_cpu variables that would be used in the kernel or in modules. If this number was too big, you waste the memory that is allocated for it, and if it were too small, you wont be able to load all the modules that are required. My first attempt to fix this introduced another dereference, to allow for modules to allocate their own memory. This was quickly shot down, and for good reason, because dereferences kill performance, and don't play nice with large SMP systems that depend on per_cpu being fast. But that first attempt had one benefit. It had good responses on how to actually fix the problem. Without the first try, I would not have tried this approach. So what is this solution? I now place the per_cpu variables into VM, such that the pages are only allocated when needed. All the architecture needs to do is supply a VM address range, size for each CPU to use (note this implementation expects all the VM CPU areas to be together), and three functions to allow for allocating page tables at bootup. The bootup page table allocations are needed because the percpu variables are used before the system initializes the memory. So bootmem is needed to load the page tables. But that's it! No more hacks for architectures with lots of CPUS and NUMA to implement their own percpu algorithms. They just supply the functions to allocate the variables, and the memory will be loaded appropriately. So all architectures with VM can use the generic solution. Does the main-line kernel support any architectures that doesn't have a VM? Another benefit is that this approach removes the one redirection that was already used in the generic solution. Since the virtual memory of the architecture for the per_cpu is static, the calculation to find the variable is done with constants. So this is another win for this solution. So the three things that this patch gives us are: 1) Robust per_cpu. No more guessing the size of the per_cpu variable sections. Default VM is 1 Meg per cpu. If you use more than that I guess you're SOL. 2) Generic solution for all architectures. No more fancy hacks to get the per cpu offest. 3) Removes the inderection of the current per_cpu generic solution. The set of patches that I have implement this for the i386 architecture and lays out the ground work for others. The i386 implementation I did was mainly a hack. Nick Piggin suggested that before I do too much work, I should post my stuff and ask for comments. That's what I'm doing now. This patch set works for my laptop (P4 HT) and I haven't tested it with any other configuration. The implementation that I did for i386 was only to get it to work and is not flexible. It needs much better implementation, and I ask for help with that. I basically concentrated to the core stuff. Another nice thing about this patch set is that it needs both CONFIG_HAS_VM_PERCPU and __ARCH_HAS_VM_PERCPU to take advantage of the VM percpu implementation. Without these set, it defaults to the old usage. This way we can slowly implement each architecture, in an incremental fashion. OK, let'er rip. Flame me, yell at me, curse me, but please give me feedback. Thanks for your time. -- Steve PS. The attached module was used to test this patch on my laptop.
#include <linux/module.h> #include <linux/percpu.h> #define BIG_SIZE 4200 DEFINE_PER_CPU(int , bigarray[BIG_SIZE]); static int __init percpu_init(void) { int i, x; for (i=0; cpu_possible(i) ; i++) for (x=0; x < BIG_SIZE; x++) { if (!(x%100)) printk("processing %d:%d\n", i, x); per_cpu(bigarray[x], i) = 10000 + x; } return 0; } static void __exit percpu_exit(void) { int i; for (i=1; i < BIG_SIZE; i <<= 1) printk("bigarray[%d] = %d\n", i, per_cpu(bigarray[i],0)); } module_init(percpu_init); module_exit(percpu_exit); MODULE_AUTHOR("My name here"); MODULE_DESCRIPTION("percpu!"); MODULE_LICENSE("GPL");