Just two quick remarks; it's far to late to really think :-) On Thu, Oct 27, 2022 at 11:13:55AM -0700, Linus Torvalds wrote: > But "fullmm" is probably even stronger than "mmap write-lock" in that > it should also mean "no other CPU can be actively using this" - either > for hardware page table walking, or for GUP. IIRC fullmm is really: this is the last user and we're taking the whole mm down -- IOW exit(). > For example, MADV_DONTNEED does this all with just the mmap lock held > for reading, so we *unless* we have that 'force_flush', we can > > (a) have another CPU continue to use the old stale TLB entry for quite a while > > (b) yet another CPU (that didn't have a TLB entry, or wanted to write > to a read-only one ) could take a page fault, and install a *new* PTE > entry in the same slot, all at the same time. > > Now, that's clearly *very* confusing. But being confusing may not mean > "wrong" - we're still delaying the free of the old entry, so there's > no use-after-free. Do we worry about CPU errata where things go side-ways if multiple CPUs have inconsistent TLB state?