On Tue, Apr 04, 2023 at 05:12:17PM +0200, Peter Zijlstra wrote: > > case 2: > > CPU-A CPU-B > > > > modify pagetables > > tlb_flush (memory barrier) > > state == CONTEXT_USER > > int state = atomic_read(&ct->state); > > Kernel-enter: > > state == CONTEXT_KERNEL > > READ(pagetable values) > > if (state & CT_STATE_MASK == CONTEXT_USER) > > Hmm, hold up; what about memory ordering, we need a store-load ordering between the page-table write and the context trackng load, and a store-load order on the context tracking update and software page-table walker loads. Now, iirc page-table modification is done under pte_lock (or page_table_lock) and that only provides a RELEASE barrier on this end, which is insufficient to order against a later load. Is there anything else? On the state tracking side, we have ct_state_inc() which is atomic_add_return() which should provide full barrier and is sufficient.