On an i386 build, I now get crypto/blake2b_generic.c: In function ‘blake2b_compress_one_generic’: crypto/blake2b_generic.c:109:1: warning: the frame size of 2640 bytes is larger than 2048 bytes [-Wframe-larger-than=] probably due to upgrading to gcc-12. But who knows - it's been several months since I bothered to do a 32-bit build, so maybe it was something else. That stack frame is disgusting on x86-64 too, but at least a bit less so (it's "only" 592 bytes there). I assume there are fewer spills due to more registers or something, and then gcc doesn't go all crazy. The actual data arrays it uses should use 256 bytes plus a few other things, so the expansion due to spilling(?) is truly ludicrous. I assumed it was some debug option causing the compiler to not re-use spill slots or something like that. We've had that before. But disabling KASAN did nothing for the stack use. Neither did disabling UBSAN or the gcc plugins. I then started to think it's the same issue that clang hit, and that Arnd fixed with 0c0408e86dbe ("crypto: blake2b - Fix clang optimization for ARMv7-M"), but adding -fno-unroll-loops -fno-peel-loops didn't do anything either. Neither did adding a #pragma GCC unroll 0 where the clang #pragma is. So this is a "Help me, Obi-Wan Kenobi. You're my only hope" email. Anybody got any ideas? It's worth noting that clang does much better. On both i386 and x86-64, it does a stack frame of just over 400 bytes. So this very much is about gcc. Linus