From: Al Viro > Sent: 23 July 2020 16:21 ... > The point is, your "~4.5 cycles per vector" is pretty much noise and the > difference between the 3-argument and 4-argument variants could easily be > in the same range. It might be a valid microoptimization, it might be not. > 3-argument variant is simpler and IMO in absence of strong data we ought > to go with that. There is definitely more to be gained by rewriting the x86-86 asm. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)