On Tue, Mar 30, 2021 at 4:26 AM Guo Ren <guoren@xxxxxxxxxx> wrote: > On Mon, Mar 29, 2021 at 9:56 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > On Mon, Mar 29, 2021 at 2:52 PM Guo Ren <guoren@xxxxxxxxxx> wrote: > > > On Mon, Mar 29, 2021 at 7:31 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > > > > What's the architectural guarantee on LL/SC progress for RISC-V ? > > > > "When LR/SC is used for memory locations marked RsrvNonEventual, > > software should provide alternative fall-back mechanisms used when > > lack of progress is detected." > > > > My reading of this is that if the example you tried stalls, then either > > the PMA is not RsrvEventual, and it is wrong to rely on ll/sc on this, > > or that the PMA is marked RsrvEventual but the implementation is > > buggy. > > Yes, PMA just defines physical memory region attributes, But in our > processor, when MMU is enabled (satp's value register > 2) in s-mode, > it will look at our custom PTE's attributes BIT(63) ref [1]: > > PTE format: > | 63 | 62 | 61 | 60 | 59 | 58-8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 > SO C B SH SE RSW D A G U X W R V > ^ ^ ^ ^ ^ > BIT(63): SO - Strong Order > BIT(62): C - Cacheable > BIT(61): B - Bufferable > BIT(60): SH - Shareable > BIT(59): SE - Security > > So the memory also could be RsrvNone/RsrvEventual. I was not talking about RsrvNone, which would clearly mean that you cannot use lr/sc at all (trap would trap, right?), but "RsrvNonEventual", which would explain the behavior you described in an earlier reply: | u32 a = 0x55aa66bb; | u16 *ptr = &a; | | CPU0 CPU1 | ========= ========= | xchg16(ptr, new) while(1) | WRITE_ONCE(*(ptr + 1), x); | | When we use lr.w/sc.w implement xchg16, it'll cause CPU0 deadlock. As I understand, this example must not cause a deadlock on a compliant hardware implementation when the underlying memory has RsrvEventual behavior, but could deadlock in case of RsrvNonEventual > [1] https://github.com/c-sky/csky-linux/commit/e837aad23148542771794d8a2fcc52afd0fcbf88 > > > > > It also seems that the current "amoswap" based implementation > > would be reliable independent of RsrvEventual/RsrvNonEventual. > > Yes, the hardware implementation of AMO could be different from LR/SC. > AMO could use ACE snoop holding to lock the bus in hw coherency > design, but LR/SC uses an exclusive monitor without locking the bus. > > RISC-V hasn't CAS instructions, and it uses LR/SC for cmpxchg. I don't > think LR/SC would be slower than CAS, and CAS is just good for code > size. What I meant here is that the current spinlock uses a simple amoswap, which presumably does not suffer from the lack of forward process you described. Arnd