On Mon, Sep 21, 2020 at 09:27:57PM +0200, Thomas Gleixner wrote: > Alternatively this could of course be solved with per CPU page tables > which will come around some day anyway I fear. Previously (with PTI) we looked at making the entire kernel map per-CPU, and that takes a 2K copy on switch_mm() (or more general, the user part of whatever the top level directory is for architectures that have a shared kernel/user page-table setup in the first place). The idea was having a fixed per-cpu kernel page-table, share a bunch of (kernel) page-tables between all CPUs and then copy in the user part on switch. I've forgotten what the plan was for ASID/PCID in that scheme. For x86_64 we've been fearing the performance of that 2k copy, but I don't think we've ever actually bit the bullet and implemented it to see how bad it really is.