On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote: > > After chatting with rmk about this (but without claiming that any of > > this is his opinion), based on the manpage, I think membarrier() > > currently doesn't really claim to be synchronizing caches? It just > > serializes cores. So arguably if userspace wants to use membarrier() > > to synchronize code changes, userspace should first do the code > > change, then flush icache as appropriate for the architecture, and > > then do the membarrier() to ensure that the old code is unused? > > > > For 32-bit arm, rmk pointed out that that would be the cacheflush() > > syscall. That might cause you to end up with two IPIs instead of one > > in total, but we probably don't care _that_ much about extra IPIs on > > 32-bit arm? > > > > For arm64, I believe userspace can flush icache across the entire > > system with some instructions from userspace - "DC CVAU" followed by > > "DSB ISH", or something like that, I think? (See e.g. > > compat_arm_syscall(), the arm64 compat code that implements the 32-bit > > arm cacheflush() syscall.) > > Note that the ARM cacheflush syscall calls flush_icache_user_range() > over the range of addresses that userspace has passed - it's intention > since day one is to support cases where userspace wants to change > executable code. > > It will issue the appropriate write-backs to the data cache (DCCMVAU), > the invalidates to the instruction cache (ICIMVAU), invalidate the > branch target buffer (BPIALLIS or BPIALL as appropriate), and issue > the appropriate barriers (DSB ISHST, ISB). > > Note that neither flush_icache_user_range() nor flush_icache_range() > result in IPIs; cache operations are broadcast across all CPUs (which > is one of the minimums we require for SMP systems.) > > Now, that all said, I think the question that has to be asked is... > > What is the basic purpose of membarrier? > > Is the purpose of it to provide memory barriers, or is it to provide > memory coherence? > > If it's the former and not the latter, then cache flushes are out of > scope, and expecting memory written to be visible to the instruction > stream is totally out of scope of the membarrier interface, whether > or not the writes happen on the same or a different CPU to the one > executing the rewritten code. > > The documentation in the kernel does not seem to describe what it's > supposed to be doing - the only thing I could find is this: > Documentation/features/sched/membarrier-sync-core/arch-support.txt > which describes it as "arch supports core serializing membarrier" > whatever that means. > > Seems to be the standard and usual case of utterly poor to non-existent > documentation within the kernel tree, or even a pointer to where any > useful documentation can be found. > > Reading the membarrier(2) man page, I find nothing in there that talks > about any kind of cache coherency for self-modifying code - it only > seems to be about _barriers_ and nothing more, and barriers alone do > precisely nothing to save you from non-coherent Harvard caches. > > So, either Andy has a misunderstanding, or the man page is wrong, or > my rudimentary understanding of what membarrier is supposed to be > doing is wrong... Look at the latest man page: https://man7.org/linux/man-pages/man2/membarrier.2.html for MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE. The result may not be all that enlightening. --Andy