On Thu, Mar 07, 2024 at 09:56:07AM -0800, Dave Hansen wrote: > On 3/7/24 09:29, Kirill A. Shutemov wrote: > > On Thu, Mar 07, 2024 at 01:39:15PM +0000, Yosry Ahmed wrote: > >> During context switching, if we are not switching to new mm and no TLB > >> flush is needed, we do not write CR3. However, it is possible that a > >> user thread enables LAM while a kthread is running on a different CPU > >> with the old LAM CR3 mask. If the kthread context switches into any > >> thread of that user process, it may not write CR3 with the new LAM mask, > >> which would cause the user thread to run with a misconfigured CR3 that > >> disables LAM on the CPU. > > I don't think it is possible. As I said we can only enable LAM when the > > process has single thread. If it enables LAM concurrently with kernel > > thread and kernel thread gets control on the same CPU after the userspace > > thread of the same process LAM is already going to be enabled. No need in > > special handling. > > I think it's something logically like this: > > // main thread > kthread_use_mm() > cr3 |= mm->lam_cr3_mask; > mm->lam_cr3_mask = foo; > cpu_tlbstate.lam = mm->lam_cr3_mask; IIUC it doesn't have to be through kthread_use_mm(). If we context switch directly from the user thread to a kthread, the kthread will keep using the user thread's mm AFAICT. > > Obviously the kthread's LAM state is going to be random. It's > fundamentally racing with the enabling thread. That part is fine. > > The main pickle is the fact that CR3 and cpu_tlbstate.lam are out of > sync. That seems worth fixing. That's what is fixed by patch 1, specifically a race between switch_mm_irqs_off() and LAM being enabled. This patch is fixing a different problem: CPU 1 CPU 2 /* user thread running */ context_switch() /* to kthread */ /* user thread enables LAM */ context_switch() context_switch() /* to user thread */ In this case, there are no races, but the second context switch on CPU 1 may not write CR3 (if TLB is up-to-date), in which case we will run the user thread with CR3 having the wrong LAM mask. This could cause bigger problems, right? > > Or is there something else that keeps this whole thing from racing in > the first place? +1 that would be good to know, but I didn't find anything.