On Sat, 5 Oct 2024 at 16:37, H. Peter Anvin <hpa@xxxxxxxxx> wrote: > > Sadly, that is not correct; neither gcc nor clang uses lea: Looking around, this may be intentional. At least according to Agner, several cores do better at "mov immediate" compared to "lea". Eg a RIP-relative LEA on Zen 2 gets a throughput of two per cycle, but a "MOV r,i" gets four. That got fixed in Zen 3 and later, but apparently Intel had similar issues (Ivy Bridge: 1 LEA per cycle, vs 3 "mov i,r". Haswell is 1:4). Of course, Agner's tables are good, but not necessarily always the whole story. There are other instruction tables on the internet (eg uops.info) with possibly more info. And in reality, I would expect it to be a complete non-issue with any OoO engine and real code, because you are very seldom ALU limited particularly when there aren't any data dependencies. But a RIP-relative LEA does seem to put a *bit* more pressure on the core resources, so the compilers are may be right to pick a "mov". Linus