Hi Andy, Sorry for the slow reply, I was socially distanced from my keyboard. On Mon, Dec 28, 2020 at 04:36:11PM -0800, Andy Lutomirski wrote: > On Mon, Dec 28, 2020 at 4:11 PM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > > +static inline void membarrier_sync_core_before_usermode(void) > > > +{ > > > + /* > > > + * XXX: I know basically nothing about powerpc cache management. > > > + * Is this correct? > > > + */ > > > + isync(); > > > > This is not about memory ordering or cache management, it's about > > pipeline management. Powerpc's return to user mode serializes the > > CPU (aka the hardware thread, _not_ the core; another wrongness of > > the name, but AFAIKS the HW thread is what is required for > > membarrier). So this is wrong, powerpc needs nothing here. > > Fair enough. I'm happy to defer to you on the powerpc details. In > any case, this just illustrates that we need feedback from a person > who knows more about ARM64 than I do. I think we're in a very similar boat to PowerPC, fwiw. Roughly speaking: 1. SYNC_CORE does _not_ perform any cache management; that is the responsibility of userspace, either by executing the relevant maintenance instructions (arm64) or a system call (arm32). Crucially, the hardware will ensure that this cache maintenance is broadcast to all other CPUs. 2. Even with all the cache maintenance in the world, a CPU could have speculatively fetched stale instructions into its "pipeline" ahead of time, and these are _not_ flushed by the broadcast maintenance instructions in (1). SYNC_CORE provides a means for userspace to discard these stale instructions. 3. The context synchronization event on exception entry/exit is sufficient here. The Arm ARM isn't very good at describing what it does, because it's in denial about the existence of a pipeline, but it does have snippets such as: (s/PE/CPU/) | For all types of memory: | The PE might have fetched the instructions from memory at any time | since the last Context synchronization event on that PE. Interestingly, the architecture recently added a control bit to remove this synchronisation from exception return, so if we set that then we'd have a problem with SYNC_CORE and adding an ISB would be necessary (and we could probable then make kernel->kernel returns cheaper, but I suspect we're relying on this implicit synchronisation in other places too). Are you seeing a problem in practice, or did this come up while trying to decipher the semantics of SYNC_CORE? Will