----- On Jul 16, 2020, at 7:00 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote: > On Thu, Jul 16, 2020 at 08:03:36PM +1000, Nicholas Piggin wrote: >> Excerpts from Peter Zijlstra's message of July 16, 2020 6:50 pm: >> > On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote: >> >> > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > >> >> But I’m wondering if all this deferred sync stuff is wrong. In the >> >> brave new world of io_uring and such, perhaps kernel access matter >> >> too. Heck, even: >> > >> > IIRC the membarrier SYNC_CORE use-case is about user-space >> > self-modifying code. >> > >> > Userspace re-uses a text address and needs to SYNC_CORE before it can be >> > sure the old text is forgotten. Nothing the kernel does matters there. >> > >> > I suppose the manpage could be more clear there. >> >> True, but memory ordering of kernel stores from kernel threads for >> regular mem barrier is the concern here. >> >> Does io_uring update completion queue from kernel thread or interrupt, >> for example? If it does, then membarrier will not order such stores >> with user memory accesses. > > So we're talking about regular membarrier() then? Not the SYNC_CORE > variant per-se. > > Even there, I'll argue we don't care, but perhaps Mathieu has a > different opinion. I agree with Peter that we don't care about accesses to user-space memory performed concurrently with membarrier. What we'd care about in terms of accesses to user-space memory from the kernel is something that would be clearly ordered as happening before or after the membarrier call, for instance a read(2) followed by membarrier(2) after the read returns, or a read(2) issued after return from membarrier(2). The other scenario we'd care about is with the compiler barrier paired with membarrier: e.g. read(2) returns, compiler barrier, followed by a store. Or load, compiler barrier, followed by write(2). All those scenarios imply before/after ordering wrt either membarrier or the compiler barrier. I notice that io_uring has a "completion" queue. Let's try to come up with realistic usage scenarios. So the dependency chain would be provided by e.g.: * Infrequent read / Frequent write, communicating read completion through variable X wait for io_uring read request completion -> membarrier -> store X=1 with matching load from X (waiting for X==1) -> asm volatile (::: "memory") -> submit io_uring write request or this other scenario: * Frequent read / Infrequent write, communicating read completion through variable X load from X (waiting for X==1) -> membarrier -> submit io_uring write request with matching wait for io_uring read request completion -> asm volatile (::: "memory") -> store X=1 Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com