On 20/11/24 16:32, Peter Zijlstra wrote: > On Wed, Nov 20, 2024 at 04:22:16PM +0100, Peter Zijlstra wrote: >> On Tue, Nov 19, 2024 at 04:35:00PM +0100, Valentin Schneider wrote: >> >> > +void noinstr __flush_tlb_all_noinstr(void) >> > +{ >> > + /* >> > + * This is for invocation in early entry code that cannot be >> > + * instrumented. A RMW to CR4 works for most cases, but relies on >> > + * being able to flip either of the PGE or PCIDE bits. Flipping CR4.PCID >> > + * would require also resetting CR3.PCID, so just try with CR4.PGE, else >> > + * do the CR3 write. >> > + * >> > + * XXX: this gives paravirt the finger. >> > + */ >> > + if (cpu_feature_enabled(X86_FEATURE_PGE)) >> > + __native_tlb_flush_global_noinstr(this_cpu_read(cpu_tlbstate.cr4)); >> > + else >> > + native_flush_tlb_local_noinstr(); >> > +} >> >> Urgh, so that's a lot of ugleh, and cr4 has that pinning stuff and gah. >> >> Why not always just do the CR3 write and call it a day? That should also >> work for paravirt, no? Just make the whole write_cr3 thing noinstr and >> voila. > > Oh gawd, just having looked at xen_write_cr3() this might not be > entirely trivial to mark noinstr :/ ... I hadn't even seen that. AIUI the CR3 RMW is not "enough" if we have PGE enabled, because then global pages aren't flushed. The question becomes: what is held in global pages and do we care about that when it comes to vmalloc()? I'm starting to think no, but this is x86, I don't know what surprises are waiting for me. I see e.g. ds_clear_cea() clears PTEs that can have the _PAGE_GLOBAL flag, and it correctly uses the non-deferrable flush_tlb_kernel_range().