On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote: > > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle) > > <willy@xxxxxxxxxxxxx> wrote: > > > > > > + "1: ldl_l %0,%4\n" > > > + " xor %0,%3,%0\n" > > > + " xor %0,%3,%2\n" > > > + " stl_c %0,%1\n" > > > > What an odd thing to do. > > > > Why don't you just save the old value? That double xor looks all kinds > > of strange, and is a data dependency for no good reason that I can > > see. > > > > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead? > > > > Not that I think alpha matters, but since I was looking through the > > series, this just made me go "Whaa?" > > Well, this is my first time writing Alpha assembler ;-) I stole this > from ATOMIC_OP_RETURN: > > "1: ldl_l %0,%1\n" \ > " " #asm_op " %0,%3,%2\n" \ > " " #asm_op " %0,%3,%0\n" \ Note how that does "orig" assignment first (ie the '%2" destination is the first instruction), unlike your version. So in that ATOMIC_OP_RETURN, it does indeed do the same ALU op twice, but there's no data dependency between the two, so they can execute in parallel. > but yes, mov would do the trick here. Is it really faster than xor? No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst", there's no actual "mov" instruction, iirc. So it's an ALU op too. What makes your version expensive is the data dependency, not the ALU op. So the *odd* thing is not that you have two xor's per se, but how you create the original value by xor'ing the value once, and then xoring the new value with the same mask, giving you the original value back - but with that odd data dependency so that it won't schedule in the same cycle. Does any of this matter? Nope. It's alpha. There's probably a handful of machines, and it's maybe one extra cycle. It's really the oddity that threw me. In ATOMIC_OP_RETURN, the reason it does that op twice is simply that it wants to return the new value. But you literally made it return the *old* value by doing an xor twice in succession, which reverses the bits twice. Was that really what you intended? Linus