On Tue, Apr 19, 2022 at 7:41 AM Andrea Parri <parri.andrea@xxxxxxxxx> wrote: > > > > Seems to me that you are basically reverting 5ce6c1f3535f > > > ("riscv/atomic: Strengthen implementations with fences"). That commit > > > fixed an memory ordering issue, could you explain why the issue no > > > longer needs a fix? > > > > I'm not reverting the prior patch, just optimizing it. > > > > In RISC-V “A” Standard Extension for Atomic Instructions spec, it said: > > With reference to the RISC-V herd specification at: > > https://github.com/riscv/riscv-isa-manual.git > > the issue, better, lr-sc-aqrl-pair-vs-full-barrier seems to _no longer_ > need a fix since commit: "0: lr.w %0, %2\n" \ " bne %0, %z3, 1f\n" \ " sc.w.rl %1, %z4, %2\n" \ " bnez %1, 0b\n" \ " fence rw, rw\n" \ Above is the current implementation, and the logic is in conflict. If we want full-barrier, we should implement like below: fence rw, w "0: lr.w %0, %2\n" \ " bne %0, %z3, 1f\n" \ " sc.w %1, %z4, %2\n" \ " bnez %1, 0b\n" \ " fence rw, rw\n" \ Above we could let lr.w & sc.w executed fastest. If we think .aq/.rl won't affect forward guarantee, we should implement like below: "0: lr.w %0, %2\n" \ " bne %0, %z3, 1f\n" \ " sc.w.aqrl %1, %z4, %2\n" \ " bnez %1, 0b\n" \ Using .aqrl is better than sc.w.rl + fence rw, rw, because lr/sc.rl pair forward guarantee is the same with lr/sw.aqrl and only sc.rl part would affect the speed of lr/sc speed. Second, it could reduce one fence rw, rw overhead. So for riscv, we needn't put a full-barrier after sc like arm64 and use .aqrl instead. > > 03a5e722fc0f ("Updates to the memory consistency model spec") > > (here a template, to double check: > > https://github.com/litmus-tests/litmus-tests-riscv/blob/master/tests/non-mixed-size/HAND/LR-SC-NOT-FENCE.litmus ) > > I defer to Daniel/others for a "bi-section" of the prose specification. > ;-) > > Andrea -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/