On Fri, Jul 17, 2015 at 11:55 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > > I doubt I'll succeed, too. But I don't want anything resembling full > per-cpu page tables -- per-cpu pgds would be plenty. Still kinda > nasty to implement. Per-cpu pgd's would have been trivial in the old 32-bit PAE environment. There's only four entries at the top level, and they have to be allocated at process startup anyway - and we wopuldn't even have to do a per-cpu-and-VM allocation, we'd just have done one single per-cpu entry, and when switching tasks we'd *copy* the VM entries to the per-cpu one and re-load %cr3 with the same address. I thought about it. But I'm really happy we never went down that road. It's non-portable, even on x86-32 (because it requires PAE). And even there it would be limited to "the top 1GB of virtual address space ends up being per-cpu", and then you have to get the vmalloc space right etc, so you have that one PGE entry for the kernel mapping that you can make be percpu and play tricks in. So you'd basically allocate one page per CPU for the magic upper PGD entry that maps the top 1GB, and edit that on-the-fly as you do task-switching. Very specialized, and the upside was very dubious. And that "simple" trick is not really doable with the x86-64 model any more (you can't copy 4kB efficiently the way you could copy 32 _bytes_ efficiently). And you really don't want to pre-allocate the whole top-level PGD either. So all the things that made it "easy" for 32-bit PAE basically went away with x86-64. No, I think the only thing that would make it possible is if there is some architecture extension that replaces part of the page table mappings with a percpu MSR describing a magic mapping or two. It would be trivial to do such an addition in hardware (it's not even in the critical path, it would be just a new magic special case for the TLB fill code), but without hardware support it's just not a good idea. (And I'm not claiming that the hw extension for per-cpu mappigns would be a good idea either, although I think it would be an _interesting_ toy to play with ;) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html