On 7/25/23 09:37, Marcelo Tosatti wrote: >> TLB flushes for freed page tables are another game entirely. The CPU is >> free to cache any part of the paging hierarchy it wants at any time. > Depend on CONFIG_PAGE_TABLE_ISOLATION=y, which flushes TLB (and page > table caches) on user->kernel and kernel->user context switches ? Well, first of all, CONFIG_PAGE_TABLE_ISOLATION doesn't flush the TLB at all on user<->kernel switches when PCIDs are enabled. Second, even if it did, the CPU is still free to cache any portion of the paging hierarchy at any time. Without LASS[1], userspace can even _compel_ walks of the kernel portion of the address space, and we don't have any infrastructure to tell if a freed kernel page is exposed in the user copy of the page tables with PTI. Third, (also ignoring PCIDs) there are plenty of instructions between kernel entry and the MOV-to-CR3 that can flush the TLB. All those instructions architecturally permitted to speculatively set Accessed or Dirty bits in any part of the address space. If they run into a free page table page, things get ugly. These accesses are not _likely_. There probably isn't a predictor out there that's going to see a: movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) and go off trying to dirty memory in the vmalloc() area. But we'd need some backward *and* forward-looking guarantees from our intrepid CPU designers to promise that this kind of thing is safe yesterday, today and tomorrow. I suspect such a guarantee is going to be hard to obtain. 1. https://lkml.kernel.org/r/20230110055204.3227669-1-yian.chen@xxxxxxxxx