Re: [RFC] LKMM: Add volatile_if()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul,

On Mon, Jun 07, 2021 at 08:25:33AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 07, 2021 at 12:52:35PM +0100, Will Deacon wrote:
> > It's the conditional instructions that are more fun. For example, the CSEL
> > instruction:
> > 
> > 	CSEL	X0, X1, X2, <cond>
> > 
> > basically says:
> > 
> > 	if (cond)
> > 		X0 = X1;
> > 	else
> > 		X0 = X2;
> > 
> > these are just register-register operations, but the idea is that the CPU
> > can predict that "branching event" inside the CSEL instruction and
> > speculatively rename X0 while waiting for the condition to resolve.
> > 
> > So then you can add loads and stores to the mix along the lines of:
> > 
> > 	LDR	X0, [X1]		// X0 = *X1
> > 	CMP	X0, X2
> > 	CSEL	X3, X4, X5, EQ		// X3 = (X0 == X2) ? X4 : X5
> > 	STR	X3, [X6]		// MUST BE ORDERED AFTER THE LOAD
> > 	STR	X7, [X8]		// Can be reordered
> > 
> > (assuming X1, X6, X8 all point to different locations in memory)
> > 
> > So now we have a dependency from the load to the first store, but the
> > interesting part is that the last store is _not_ ordered wrt either of the
> > other two memory accesses, whereas it would be if we used a conditional
> > branch instead of the CSEL. Make sense?
> 
> And if I remember correctly, this is why LKMM orders loads in the
> "if" condition only with stores in the "then" and "else" clauses,
> not with stores after the end of the "if" statement.  Or is there
> some case that I am missing?

It's not clear to me that such a restriction prevents the compiler from
using any of the arm64 conditional instructions in place of the conditional
branch in such a way that you end up with an "independent" store in the
assembly output constructed from two stores on the "then" and "else" paths
which the compiler determined where the same.

> > Now, obviously the compiler is blissfully unaware that conditional
> > data processing instructions can give rise to dependencies than
> > conditional branches, so the question really is how much do we need to
> > care in the kernel?
> > 
> > My preference is to use load-acquire instead of control dependencies so
> > that we don't have to worry about this, or any future relaxations to the
> > CPU architecture, at all.
> 
> From what I can see, ARMv8 has DMB(LD) and DMB(ST).  Does it have
> something like a DMB(LD,ST) that would act something like powerpc lwsync?
> 
> Or are you proposing rewriting the "if" conditions to upgrade
> READ_ONCE() to smp_load_acquire()?  Or something else?
> 
> Just trying to find out exactly what you are proposing.  ;-)

Some options are:

 (1) Do nothing until something actually goes wrong (and hope we spot/debug it)

 (2) Have volatile_if force a conditional branch, assuming that it solves
     the problem and doesn't hurt codegen (I still haven't convinced myself
     for either case)

 (3) Upgrade READ_ONCE() to RCpc acquire, relaxed atomic RMWs to RCsc
     acquire on arm64

 (4) Introduce e.g. READ_ONCE_CTRL(), atomic_add_return_ctrl() etc
     specifically for control dependencies and upgrade only those for
     arm64

 (5) Work to get toolchain support for dependency ordering and use that

I'm suggesting (3) or (4) because, honestly, it feels like we're being
squeezed from both sides with both the compiler and the hardware prepared
to break control dependencies.

Will



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux