On Wed, Apr 7, 2021 at 11:43 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Tue, Apr 06, 2021 at 09:15:50AM +0200, Peter Zijlstra wrote: > > Anyway, given you have such a crap architecture (and here I thought > > RISC-V was supposed to be a modern design *sigh*), you had better go > > look at the sparc64 atomic implementation which has a software backoff > > for failed CAS in order to make fwd progress. > > It wasn't supposed to be modern. It was supposed to use boring old > ideas. Where it actually did that it is a great ISA, in parts where > academics actually tried to come up with cool or state of the art > ideas (interrupt handling, tlb shootdowns, the totally fucked up > memory model) it turned into a trainwreck. Gentlemen, please rethink your wording. RISC-V is neither "crap" nor a "trainwreck", regardless if you like it or not. The comparison with sparc64 is not applicable, as sparc64 does not have LL/SC instructions. Further, it is not the case that RISC-V has no guarantees at all. It just does not provide a forward progress guarantee for a synchronization implementation, that writes in an endless loop to a memory location while trying to complete an LL/SC loop on the same memory location at the same time. If there's a reasonable algorithm, that relies on forward progress in this case, then we should indeed think about that, but I haven't seen one so far. The whole MCF lock idea is to actually spin on different memory locations per CPU to improve scalability (reduce cacheline bouncing). That's a clear indicator, that well-scaling synchronization algorithms need to avoid contended cache lines anyways. RISC-V defines LR/SC loops consisting of up to 16 instructions as constrained LR/SC loops. Such constrained LR/SC loops provide the required forward guarantees, that are expected (similar to what other architectures, like AArch64, have). What RISC-V does not have is sub-word atomics and if required, we would have to implement them as LL/SC sequences. And yes, using atomic instructions is preferred over using LL/SC, because atomics will tend to perform better (less instructions and less spilled registers). But that actually depends on the actual ISA implementation. Respectfully, Christoph