From: Alexey Dobriyan <adobriyan@xxxxxxxxx> Date: Wed, 15 Feb 2012 22:27:52 +0300 > On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote: >> From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxxx> >> Date: Wed, 15 Feb 2012 16:16:08 +1100 >> >> > OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of >> > that is expected since we moved W onto the stack. >> >> Right. >> >> > I guess we could go back to the percpu solution, what do you >> > think? >> >> I'm not entirely sure, we might have to. >> >> sha512 is notorious for generating terrible code with gcc on 32-bit >> targets, so... The sha512 test in the glibc testsuite tends to >> timeout on 32-bit sparc. :-) > > Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig): > > 00000000 <sha512_transform>: > 0: 9d e3 bc 78 save %sp, -904, %sp > > git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git > b85a088f15f2070b7180735a231012843a5ac96c > "crypto: sha512 - use standard ror64()" I'm happy with a solution that involves pushing this change to Linus's tree, it's pretty clear why it helps so much although I'm disappointed that gcc can't se that the u64 shift argument passed in is always a constant and therefore way within the range of a 32-bit value, ho hum :-) In fact, in my tree, this change brings the stack allocation instruction down to: save %sp, -824, %sp ! which is actually BETTER than what the old per-cpu code got: save %sp, -984, %sp ! Therefore I highly recommend we apply that ror() change to Linus's tree now. :-) -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html