On Mon, Dec 28, 2020 at 9:23 AM Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 28, 2020 at 09:14:23AM -0800, Andy Lutomirski wrote: > > On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin > > <linux@xxxxxxxxxxxxxxx> wrote: > > > > > > On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote: > > > > On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers > > > > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > > > > > > > > > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@xxxxxxxxxx wrote: > > > > > > > > > > > > > > > > > > > > > I admit that I'm rather surprised that the code worked at all on arm64, > > > > > > and I'm suspicious that it has never been very well tested. My apologies > > > > > > for not reviewing this more carefully in the first place. > > > > > > > > > > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt > > > > > > > > > > It clearly states that only arm, arm64, powerpc and x86 support the membarrier > > > > > sync core feature as of now: > > > > > > > > Sigh, I missed arm (32). Russell or ARM folks, what's the right > > > > incantation to make the CPU notice instruction changes initiated by > > > > other cores on 32-bit ARM? > > > > > > You need to call flush_icache_range(), since the changes need to be > > > flushed from the data cache to the point of unification (of the Harvard > > > I and D), and the instruction cache needs to be invalidated so it can > > > then see those updated instructions. This will also take care of the > > > necessary barriers that the CPU requires for you. > > > > With what parameters? From looking at the header, this is for the > > case in which the kernel writes some memory and then intends to > > execute it. That's not what membarrier() does at all. membarrier() > > works like this: > > You didn't specify that you weren't looking at kernel memory. > > If you're talking about userspace, then the interface you require > is flush_icache_user_range(), which does the same as > flush_icache_range() but takes userspace addresses. Note that this > requires that the memory is currently mapped at those userspace > addresses. > > If that doesn't fit your needs, there isn't an interface to do what > you require, and it basically means creating something brand new > on every architecture. > > What you are asking for is not "just a matter of a few instructions". > I have stated the required steps to achieve what you require above; > that is the minimum when you have non-snooping harvard caches, which > the majority of 32-bit ARMs have. > > > User thread 1: > > > > write to RWX memory *or* write to an RW alias of an X region. > > membarrier(...); > > somehow tell thread 2 that we're ready (with a store release, perhaps, > > or even just a relaxed store.) > > > > User thread 2: > > > > wait for the indication from thread 1. > > barrier(); > > jump to the code. > > > > membarrier() is, for better or for worse, not given a range of addresses. > > Then, I'm sorry, it can't work on 32-bit ARM. Is there a way to flush the *entire* user icache? If so, and if it has reasonable performance, then it could probably be used here. Otherwise I'll just send a revert for this whole mechanism on 32-bit ARM. --Andy