On Sunday 16 April 2006 15:40, Steven Rostedt wrote: > I'll think more about this, but maybe someone else has some crazy ideas > that can find a solution to this that is both fast and robust. Ok, you asked for a crazy idea, you're going to get it ;-) You could take a fixed range from the vmalloc area (e.g. 1MB per cpu) and use that to remap pages on demand when you need per cpu data. #define PER_CPU_BASE 0xe000000000000000UL /* arch dependant */ #define PER_CPU_SHIFT 0x100000UL #define __per_cpu_offset(__cpu) (PER_CPU_BASE + PER_CPU_STRIDE * (__cpu)) #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu))) #define __get_cpu_var(var) per_cpu(var, smp_processor_id()) This is a lot like the current sparc64 implementation already is. The tricky part here is the remapping of pages. You'd need to alloc_pages_node() new pages whenever the already reserved space is not enough for the module you want to load and then map_vm_area() them into the space reserved for them. Advantages of this solution are: - no dependant load access for per_cpu() - might be flexible enough to implement a faster per_cpu_ptr() - can be combined with ia64-style per-cpu remapping Disadvantages are: - you can't use huge tlbs for mapping per cpu data like the regular linear mapping -> may be slower on some archs - does not work in real mode, so percpu data can't be used inside exception handlers on some architectures. - memory consumption is rather high when PAGE_SIZE is large Arnd <><