On Thu, Oct 21, 2021 at 10:18:59PM +0200, Peter Zijlstra wrote: > On Thu, Oct 21, 2021 at 04:57:09PM -0300, Marcelo Tosatti wrote: > > > Pretty much everything in noinstr is magical, we just have to think > > > harder there (and possibly start writing more comments there). > > > > mds_user_clear_cpu_buffers happens after sync_core, in your patchset, > > if i am not mistaken. > > Of course it does, mds_user_clear_cpu_buffers() is on exit, the > sync_core() is on entry. static_key enable/disable __exit_to_user_mode -> context_tracking_set_cpu_work(cpu, work) user_enter_irqoff -> preempt_disable(); __context_tracking_enter(CONTEXT_USER); seq = atomic_read(&ct->seq); ct_seq_user_enter(raw_cpu_ptr(&context_tracking)); if (__context_tracking_seq_in_user(seq)) { { /* ctrl-dep */ arch_atomic_set(&ct->work, 0); atomic_or(work, &ct->work); return arch_atomic_add_return(CT_SEQ_USER, &ct->seq); ret = atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK); } } preempt_enable(); arch_exit_to_user_mode() mds_user_clear_cpu_buffers(); <--- sync_core work queued, but not executed. i-cache potentially stale? ct_seq_user_enter should happen _after_ all possible static_key users? (or recheck that there is no pending work after any possible rewritable code/static_key user). > > > > > > + /* NMI happens here and must still do/finish CT_WORK_n */ > > > > > + sync_core(); > > > > > > > > But after the discussion with you, it seems doing the TLB checking > > > > and (also sync_core) checking very late/very early on exit/entry > > > > makes things easier to review. > > > > > > I don't know about late, it must happen *very* early in entry. The > > > sync_core() must happen before any self-modifying code gets called > > > (static_branch, static_call, etc..) with possible exception of the > > > context_tracking static_branch. > > > > > > The TLBi must also happen super early, possibly while still on the > > > entry stack (since the task stack is vmap'ed). > > > > But will it be ever be freed/remapped from other CPUs while the task > > is running? > > Probably not, still something we need to be really careful with. > > > > > We currently don't run C > > > code on the entry stack, that needs quite a bit of careful work to make > > > happen. > > > > Was thinking of coding in ASM after (as early as possible) the write to > > switch to kernel CR3: > > No, we're not going to add new feature to ASM. You'll just have to wait > until all that gets lifted to C. > > > Kernel entry: > > ------------- > > > > cpu = smp_processor_id(); > > > > if (isolation_enabled(cpu)) { > > reqs = atomic_xchg(&percpudata->user_kernel_state, IN_KERNEL_MODE); > > if (reqs & CPU_REQ_FLUSH_TLB) > > flush_tlb_all(); > > if (reqs & CPU_REQ_SYNC_CORE) > > sync_core(); > > } > > > > Exit to userspace (as close to write to CR3 with user pagetable > > pointer): > > ----------------- > > > > cpu = smp_processor_id(); > > > > if (isolation_enabled(cpu)) { > > atomic_or(IN_USER_MODE, &percpudata->user_kernel_state); > > } > > > > You think that is a bad idea (in ASM, not C) ? > > Those atomics are a bad idea and not goig to happen. > > > > We're not going to add an atomic to context tracking. There is one, we > > > just got to extract/share it with RCU. > > > > Again, to avoid kernel TLB flushes you'd have to ensure: > > I know how it works, but we're not going to add a second atomic to > entry/exit. RCU has one in there, that's going to be it. Again, we just > got to extract/share.