On 8 June 2018 at 11:54, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > On Sat, May 26, 2018 at 9:22 AM, Eric Biggers <ebiggers3@xxxxxxxxx> wrote: >> On Sat, May 12, 2018 at 10:43:08AM +0200, Dmitry Vyukov wrote: >>> On Fri, Feb 2, 2018 at 11:18 PM, Eric Biggers <ebiggers3@xxxxxxxxx> wrote: >>> > On Fri, Feb 02, 2018 at 02:57:32PM +0100, Dmitry Vyukov wrote: >>> >> On Fri, Feb 2, 2018 at 2:48 PM, syzbot >>> >> <syzbot+ffa3a158337bbc01ff09@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: >>> >> > Hello, >>> >> > >>> >> > syzbot hit the following crash on upstream commit >>> >> > 7109a04eae81c41ed529da9f3c48c3655ccea741 (Thu Feb 1 17:37:30 2018 +0000) >>> >> > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide >>> >> > >>> >> > So far this crash happened 4 times on net-next, upstream. >>> >> > C reproducer is attached. >>> >> > syzkaller reproducer is attached. >>> >> > Raw console output is attached. >>> >> > compiler: gcc (GCC) 7.1.1 20170620 >>> >> > .config is attached. >>> >> >>> >> >>> >> From suspicious frames I see salsa20_asm_crypt there, so +crypto maintainers. >>> >> >>> > >>> > Looks like the x86 implementations of Salsa20 (both i586 and x86_64) need to be >>> > updated to not use %ebp/%rbp. >>> >>> Ard, >>> >>> This was bisected as introduced by: >>> >>> commit 83dee2ce1ae791c3dc0c9d4d3a8d42cb109613f6 >>> Author: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> >>> Date: Fri Jan 19 12:04:34 2018 +0000 >>> >>> crypto: sha3-generic - rewrite KECCAK transform to help the >>> compiler optimize >>> >>> https://gist.githubusercontent.com/dvyukov/47f93f5a0679170dddf93bc019b42f6d/raw/65beac8ddd30003bbd4e9729236dc8572094abf7/gistfile1.txt >> >> Note that syzbot's original C reproducer (from Feb 1) for this actually >> triggered the warning through salsa20-asm, which I've just proposed to "fix" by >> https://patchwork.kernel.org/patch/10428863/. sha3-generic is apparently >> another instance of the same bug, where the %rbp register is used for data. > > > Mailed "crypto: don't optimize keccakf()" to fix this. > > Amusingly __optimize("O3") always lead to degraded performance as gcc does not > inline across different optimizations levels, so keccakf() wasn't inlined > into its callers and keccakf_round() wasn't inlined into keccakf(). That does not make sense. The -O3 definitely made the code run slightly faster on AArch64, but I don't remember the exact numbers or the compiler version. In any case, it wasn't an improvement worth obsessing about compared to the 14x speedup I got on A53 from rewriting the code itself.