On Mon, Jul 24, 2023 at 10:40:04AM -0700, Dave Hansen wrote: > On 7/24/23 04:32, Valentin Schneider wrote: > > AFAICT the only reasonable way to go about the deferral is to prove that no > > such access happens before the deferred @operation is done. We got to prove > > that for sync_core() deferral, cf. PATCH 18. > > > > I'd like to reason about it for deferring vunmap TLB flushes: > > > > What addresses in VMAP range, other than the stack, can early entry code > > access? Yes, the ranges can be checked at runtime, but is there any chance > > of figuring this out e.g. at build-time? > > Nadav was touching on a very important point: TLB flushes for addresses > are relatively easy to defer. You just need to ensure that the CPU > deferring the flush does an actual flush before it might architecturally > consume the contents of the flushed entry. > > TLB flushes for freed page tables are another game entirely. The CPU is > free to cache any part of the paging hierarchy it wants at any time. Depend on CONFIG_PAGE_TABLE_ISOLATION=y, which flushes TLB (and page table caches) on user->kernel and kernel->user context switches ? So freeing a kernel pagetable page does not require interrupting a CPU which is in userspace (therefore does not have visibility into kernel pagetables). > It's also free to set accessed and dirty bits at any time, even for > instructions that may never execute architecturally. > > That basically means that if you have *ANY* freed page table page > *ANYWHERE* in the page table hierarchy of any CPU at any time ... you're > screwed. > > There's no reasoning about accesses or ordering. As soon as the CPU > does *anything*, it's out to get you. > > You're going to need to do something a lot more radical to deal with > free page table pages.