On Thu, Jul 16, 2015 at 12:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > > If we actually bit the bullet and implemented per-cpu mappings That's not ever going to happen. Per-cpu page tables are a complete disaster. It's a recipe for crazy race conditions, when you have CPUs that update things like dirty/accessed bits atomically etc, and you have fundamental races when multiple CPU's allocating page tables at the same time (remember: we have concurrent page faults, and the locking is not per-vm, it's at a finer granularity). It's also a big memory management problem when you have lots and lots of CPU's. So don't go there. The only way to do per-cpu virtual mappings is hardware-specific, it if you have hardware that explicitly allows inserting percpu TLB entries (while still sharing the page tables), then that would be ok. And we don't have that on x86. MIPS has explicit support for these kinds of TLB backs, and obviously on other architectures you might be able to play games with the SW-fill TLB, but on x86 there's no hardware support for per-CPU TLB filling. And this is not just theory. We've seen what happens when people try to do per-thread page tables. It's happened several times, and it's a fundamental mistake. Plan-9 had "private mappings" because that's how they did stacks (ie the stack mappings were thread-local), and it means that thread switching is fundamentally broken. I think Mach did too. And per-cpu page tables are less broken from a scheduling standpoint than per-thread page tables, but still do share a lot of the synchronization problems, and have some allocation issues all their own. The Linux VM model of "one page table per VM" is the right one. Anything else sucks, and makes threading a disaster. So you can try to prove me wrong, but seriously, I doubt you'll succeed. On x86, if you want per-cpu memory areas, you should basically plan on using segment registers instead (although other odd state has been used - there's been the people who use segment limits etc rather than the *pointer* itself, preferring to use "lsl" to get percpu data. You could also imaging hiding things in the vector state somewhere if you control your environment well enough). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html