On Sun, May 08, 2022 at 09:37:26AM +0900, Stafford Horne wrote: > On Thu, May 05, 2022 at 05:38:58AM +0900, Stafford Horne wrote: > > On Wed, May 04, 2022 at 01:10:03PM -0700, Linus Torvalds wrote: > > > On Wed, May 4, 2022 at 12:58 PM Stafford Horne <shorne@xxxxxxxxx> wrote: > > > > > > > > I have uploaded a diff I created here: > > > > https://gist.github.com/54334556f2907104cd12374872a0597c > > > > > > > > It shows the same output. > > > > > > In hex_to_bin itself it seems to only be a difference due to some > > > register allocation (r19 and r3 switched around). > > > > > > But then it gets inlined into hex2bin and there changes there seem to > > > be about instruction and basic block scheduling, so it's a lot harder > > > to see what's going on. > > > > > > And a lot of constant changes, which honestly look just like code code > > > moved around by 16 bytes and offsets changed due to that. > > > > > > So I doubt it's hex_to_bin() that is causing problems, I think it's > > > purely code movement. Which explains why adding a nop or a fake printk > > > fixes things. > > > > > > Some alignment assumption that got broken? > > > > This is what it looks like to me too. I will have to do a deep dive on what is > > going on with this particular build combination as I can't figure out what it is > > off the top of my head. > > > > This test is using a gcc 11 compiler, I tried with my gcc 12 toolchain and the > > issue cannot be reproduced. > > > > - musl gcc 11 - https://musl.cc/or1k-linux-musl-cross.tgz > > - openrisc gcc 12 - https://github.com/openrisc/or1k-gcc/releases/tag/or1k-12.0.1-20220210-20220304 > > > > But again the difference between the two compiler outputs is a lot of register > > allocation and offsets changes. Its not easy to see anything that stands out. > > I checked the change log for the openrisc specific changes from gcc 11 to gcc > > 12. Nothing seems to stand out, mcount profiler fix for PIC, a new large binary > > link flag. > > Hello, > > Just an update on this. The issue so far has been traced to the alignment of > the crypto multiplication function fe_mul_impl in lib/crypto/curve25519-fiat32.c. > > This patch e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time") > allowed for the alignment to be just right to cause the failure. Without > this patch and forcing the alignment we can reproduce the issue. So though the > bisect is correct, this patch is not the root cause of the issue. > > Using some l.nop sliding techniques and some strategically placed .align > statements I have been able to reproduce the issue on: > > - gcc 11 and gcc 12 > - preempt and non-preempt kernels > > I have not been able to reproduce this on my FPGA, so far only QEMU. My > hunch now is that since the fe_mul_impl function contains some rather long basic > blocks it appears that timer interrupts that interrupt qemu mid basic block > execution somehow is causing this. The hypothesis is it may be basic block > resuming behavior in qemu that causes incosistent behavior. > > It will take a bit more time to trace this. Since I maintain OpenRISC QEMU the > issue is on me. > > Again, It's safe to say that patch e5be15767e7e ("hex2bin: make the function > hex_to_bin constant-time") is not an issue. This issue has been fixed. I sent a patch to QEMU for it: - https://lore.kernel.org/qemu-devel/20220511120541.2242797-1-shorne@xxxxxxxxx/T/#u The issue was a bug in the OpenRISC emulation in QEMU which was triggered when receiving a TICK TIMER interrupt, in a delay slot, on a page boundary. The fix was simple enough, but investigation took quite some work. Thanks, -Stafford