Re: blake2b_compress_one_generic() stack use with gcc-12

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Linus,

On Sat, Jun 11, 2022 at 12:24:29PM -0700, Linus Torvalds wrote:
> So this is a "Help me, Obi-Wan Kenobi. You're my only hope" email.
> 
> Anybody got any ideas?

The way to fix this is by re-rolling the rounds:

diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c
index 6704c0355889..ac50c3086f6b 100644
--- a/crypto/blake2b_generic.c
+++ b/crypto/blake2b_generic.c
@@ -89,18 +89,9 @@ static void blake2b_compress_one_generic(struct blake2b_state *S,
 	v[14] = BLAKE2B_IV6 ^ S->f[0];
 	v[15] = BLAKE2B_IV7 ^ S->f[1];

-	ROUND(0);
-	ROUND(1);
-	ROUND(2);
-	ROUND(3);
-	ROUND(4);
-	ROUND(5);
-	ROUND(6);
-	ROUND(7);
-	ROUND(8);
-	ROUND(9);
-	ROUND(10);
-	ROUND(11);
+	for (i = 0; i < 12; ++i)
+		ROUND(i);
+
 #ifdef CONFIG_CC_IS_CLANG
 #pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
 #endif

The good thing about doing this is that it makes the function smaller,
so it fits easier into cache. The bad thing is that it means the
blake2b_sigma array can't be inlined, so you have accesses into that,
which means overall worse performance if called repeatedly (which maybe
it is? This is only used for btrfs disk hashing afaik). BLAKE2 is
actually 12 different permutations, rather than one, which is partially
how they got away with reducing the number of rounds from the original
BLAKE SHA-3 finalist. But without fancy permutation SIMD instructions,
it also means you're left with this trade off in the generic C code.

However, getting back to the actual issue you pointed out, this seems
like an odd gcc 12 issue. Or perhaps it's some aspect of the inliner
there? I noticed that on gcc 11, the above patch increases the frame
size from 416 to 704. But on gcc 12 it decreases it from 2632 to 680.
Huh.

Jason



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux