On Wed, Jan 25, 2012 at 6:43 PM, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > This push fixes a race condition in sha512 that affects users > who use it in process context and softirq context concurrently, > in particular, this affects IPsec. The result of the race is > the production of incorrect hashes, which for IPsec leands to > loss of connectivity. Ugh. This once more has the crazy signed integer modulus operator, which can be quite expensive depending on whether the compiler can tell whether it is always positive or not. Also, that modulus is exposed everywhere. In git, the sha1 implementation (which has many of the same issues) does this: /* This "rolls" over the 512-bit array */ #define W(x) (array[(x)&15]) which means that the modulus exists in just one place (and is the correct binary 'and', not the possibly-expensive division). We also avoid the problem with absolutely horrible gcc register usage by having an arch-specific "accessor macro": /* * If you have 32 registers or more, the compiler can (and should) * try to change the array[] accesses into registers. However, on * machines with less than ~25 registers, that won't really work, * and at least gcc will make an unholy mess of it. * * So to avoid that mess which just slows things down, we force * the stores to memory to actually happen (we might be better off * with a 'W(t)=(val);asm("":"+m" (W(t))' there instead, as * suggested by Artur Skawina - that will also make gcc unable to * try to do the silly "optimize away loads" part because it won't * see what the value will be). * * Ben Herrenschmidt reports that on PPC, the C version comes close * to the optimized asm with this (ie on PPC you don't want that * 'volatile', since there are lots of registers). * * On ARM we get the best code generation by forcing a full memory barrier * between each SHA_ROUND, otherwise gcc happily get wild with spilling and * the stack frame size simply explode and performance goes down the drain. */ #if defined(__i386__) || defined(__x86_64__) #define setW(x, val) (*(volatile unsigned int *)&W(x) = (val)) #elif defined(__GNUC__) && defined(__arm__) #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0) #else #define setW(x, val) (W(x) = (val)) #endif which is not pretty, but as you guys found out, the alternative can be much worse (ie totally crazy gcc register spilling) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html