On Fri, Mar 08, 2024 at 07:23:58AM -0800, Dave Hansen wrote: > On 3/7/24 17:34, Andy Lutomirski wrote: > >> Fix this by making sure we write a new CR3 if LAM is not > >> up-to-date. No problems were observed in practice, this was found > >> by code inspection. > > I think it should be fixed with a much bigger hammer: explicit IPIs. > > Just don't ever let it get out of date, like install_ldt(). > I guess it matters whether the thing that matters is having a persistent > inconsistency or a temporary one. IPIs will definitely turn a permanent > one into a temporary one. > > But this is all easier to reason about if we can get rid of even the > temporary inconsistency. > > Wouldn't this be even simpler than IPIs? > > static inline unsigned long set_tlbstate_lam_mode(struct mm_struct *mm) > { > unsigned long lam = READ_ONCE(mm->context.lam_cr3_mask); > > + /* LAM is for userspace only. Ignore it for kernel threads: */ > + if (tsk->flags & PF_KTHREAD) > + return 0; > > this_cpu_write(cpu_tlbstate.lam, lam >> X86_CR3_LAM_U57_BIT); > this_cpu_write(tlbstate_untag_mask, mm->context.untag_mask); > return lam; > } Hmm I don't see how this fixes the problem addressed by this patch. I think this fixes the problem addressed by patch 1, where CR3 and cpu_tlbstate.lam may get out of sync if LAM enablement races with switch_mm_irqs_off(). However, this patch is fixing a deeper problem (an actual bug). Precisely this situation: CPU 1 CPU 2 /* kthread */ kthread_use_mm() /* user thread */ prctl_enable_tagged_addr() /* LAM enabled */ context_switch() /* to CPU 1 */ switch_mm_irqs_off() /* user thread */ ---> LAM is disabled here <--- When switch_mm_irqs_off() runs on CPU 1 to switch from the kthread to the user thread, because the mm is not actually changing, we may not write CR3. In this case, the user thread runs on CPU 1 with LAM disabled, right? The IPI would fix this problem because prctl_enable_tagged_addr() will make sure that CPU 1 enables LAM before it returns to userspace. Alternatively, this patch fixes the problem by making sure we write CR3 in switch_mm_irqs_off() if LAM is out-of-date. I don't see how skipping set_tlbstate_lam_mode() for kthreads fixes this problem. Do you mind elaborating?