On Thu, Jul 06, 2017 at 06:04:13PM -0700, Palmer Dabbelt wrote: [...] > >> +#define __smp_load_acquire(p) \ > >> +do { \ > >> + union { typeof(*p) __val; char __c[1]; } __u = \ > >> + { .__val = (__force typeof(*p)) (v) }; \ > >> + compiletime_assert_atomic_type(*p); \ > >> + switch (sizeof(*p)) { \ > >> + case 1: \ > >> + case 2: \ > >> + __u.__val = READ_ONCE(*p); \ > >> + smb_mb(); \ > >> + break; \ > >> + case 4: \ > >> + __asm__ __volatile__ ( \ > >> + "amoor.w.aq %1, zero, %0" \ > >> + : "+A" (*p) \ > >> + : "=r" (__u.__val) \ > >> + : "memory"); \ > >> + break; \ > >> + case 8: \ > >> + __asm__ __volatile__ ( \ > >> + "amoor.d.aq %1, zero, %0" \ > >> + : "+A" (*p) \ > >> + : "=r" (__u.__val) \ > >> + : "memory"); \ > >> + break; \ > >> + } \ > >> + __u.__val; \ > >> +} while (0) > > > > 'creative' use of amoswap and amoor :-) > > > > You should really look at a normal load with ordering instruction > > though, that amoor.aq is a rmw and will promote the cacheline to > > exclusive (and dirty it). > > The thought here was that implementations could elide the MW by pattern > matching the "zero" (x0, the architectural zero register) forms of AMOs where > it's interesting. I talked to one of our microarchitecture guys, and while he > agrees that's easy he points out that eliding half the AMO may wreak havoc on > the consistency model. Since we're not sure what the memory model is actually > going to look like, we thought it'd be best to just write the simplest code > here > > /* > * TODO_RISCV_MEMORY_MODEL: While we could emit AMOs for the W and D sized > * accesses here, it's questionable if that actually helps or not: the lack of > * offsets in the AMOs means they're usually preceded by an addi, so they > * probably won't save code space. For now we'll just emit the fence. > */ > #define __smp_store_release(p, v) \ > ({ \ > compiletime_assert_atomic_type(*p); \ > smp_mb(); \ > WRITE_ONCE(*p, v); \ > }) > > #define __smp_load_acquire(p) \ > ({ \ > union{typeof(*p) __p; long __l;} __u; \ AFAICT, there seems to be an endian issue if you do this. No? Let us assume typeof(*p) is char and *p == 1, and on a big endian 32bit platform: > compiletime_assert_atomic_type(*p); \ > __u.__l = READ_ONCE(*p); \ READ_ONCE(*p) is 1 so __u.__l is 0x00 00 00 01 now > smp_mb(); \ > __u.__p; \ __u.__p is then 0x00. Am I missing something here? Even so why not use the simple definition as in include/asm-generic/barrier.h? Regards, Boqun > }) > [...]
Attachment:
signature.asc
Description: PGP signature