Hi Paul, On Mon, Jun 07, 2021 at 08:25:33AM -0700, Paul E. McKenney wrote: > On Mon, Jun 07, 2021 at 12:52:35PM +0100, Will Deacon wrote: > > It's the conditional instructions that are more fun. For example, the CSEL > > instruction: > > > > CSEL X0, X1, X2, <cond> > > > > basically says: > > > > if (cond) > > X0 = X1; > > else > > X0 = X2; > > > > these are just register-register operations, but the idea is that the CPU > > can predict that "branching event" inside the CSEL instruction and > > speculatively rename X0 while waiting for the condition to resolve. > > > > So then you can add loads and stores to the mix along the lines of: > > > > LDR X0, [X1] // X0 = *X1 > > CMP X0, X2 > > CSEL X3, X4, X5, EQ // X3 = (X0 == X2) ? X4 : X5 > > STR X3, [X6] // MUST BE ORDERED AFTER THE LOAD > > STR X7, [X8] // Can be reordered > > > > (assuming X1, X6, X8 all point to different locations in memory) > > > > So now we have a dependency from the load to the first store, but the > > interesting part is that the last store is _not_ ordered wrt either of the > > other two memory accesses, whereas it would be if we used a conditional > > branch instead of the CSEL. Make sense? > > And if I remember correctly, this is why LKMM orders loads in the > "if" condition only with stores in the "then" and "else" clauses, > not with stores after the end of the "if" statement. Or is there > some case that I am missing? It's not clear to me that such a restriction prevents the compiler from using any of the arm64 conditional instructions in place of the conditional branch in such a way that you end up with an "independent" store in the assembly output constructed from two stores on the "then" and "else" paths which the compiler determined where the same. > > Now, obviously the compiler is blissfully unaware that conditional > > data processing instructions can give rise to dependencies than > > conditional branches, so the question really is how much do we need to > > care in the kernel? > > > > My preference is to use load-acquire instead of control dependencies so > > that we don't have to worry about this, or any future relaxations to the > > CPU architecture, at all. > > From what I can see, ARMv8 has DMB(LD) and DMB(ST). Does it have > something like a DMB(LD,ST) that would act something like powerpc lwsync? > > Or are you proposing rewriting the "if" conditions to upgrade > READ_ONCE() to smp_load_acquire()? Or something else? > > Just trying to find out exactly what you are proposing. ;-) Some options are: (1) Do nothing until something actually goes wrong (and hope we spot/debug it) (2) Have volatile_if force a conditional branch, assuming that it solves the problem and doesn't hurt codegen (I still haven't convinced myself for either case) (3) Upgrade READ_ONCE() to RCpc acquire, relaxed atomic RMWs to RCsc acquire on arm64 (4) Introduce e.g. READ_ONCE_CTRL(), atomic_add_return_ctrl() etc specifically for control dependencies and upgrade only those for arm64 (5) Work to get toolchain support for dependency ordering and use that I'm suggesting (3) or (4) because, honestly, it feels like we're being squeezed from both sides with both the compiler and the hardware prepared to break control dependencies. Will