I played around tweaking the code a bit more and I got our performance down to a 2.077182x slowdown with check and a 1.055961x slowdown without checking. However, that slowdown is basically with the check turned off through our API. If I rip extraneous code for storing states and checking if we are doing collision detection out, I can reach performance parity with the block-sha1 implementation in the Git codebase, which basically tells me that is about as good as I can do for optimizing the C code. SHA1 is more amenable to assembler implementation because its use of rotations, which are notoriously difficult to access through C code. And as this happens in the inner loop of the function, the inline asm tends to not cut it. This is one of the reasons that the OpenSSL SHA-1 runs like a scalded monkey, compared to the C implemenations. Marc and I have also discussed using SIMD operations to speed up the UBC checks, which could definitely help achieve better performance, but is highly dependent on processor support. It will take some time to do either a SIMD implementation of the UBC checks or an assembler implementation. At this point, I would suggest that I take the C optimizations, clean them up and fold them in with the diet changes Linus has suggested. The slowdown is still 2x over block-sha1 and more over OpenSSL. But it is better than nothing. And then if there is interest Marc and I can investigate other processor specific optimizations like ASM or SIMD and circle back with those performance optimizations at a later date. Also, to Johannes Schindelin's point: > My concern is about that unexpected turn "oh, let's just switch to C99 because, well, because my compiler canehandle it, and everybody else should just switch tn a modern compiler". That really sounded careless. While it will probably be a pain, if it is a requirement, we can modify the code to move away from any c99 specific stuff we have in here, if it makes adopting the code more palatable for Git. Thanks, Dan