On 8 April 2015 at 15:40, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > On 8 April 2015 at 15:30, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: >> On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote: >>> >>> Not having to call the function twice is the whole point. In the arm64 >>> case, all the SHA-256 round keys can be kept in registers (it has 32 >>> 16-byte SIMD registers), and that is what motivates this pattern. By >>> passing a head block, a pointer to the source and the generic pointer >>> (which arm64 uses to finalize the block, we can process all data in a >>> single invocation of the block transform) >> >> Does this really make any difference? With IPsec the partial code >> path is never even going to get executed. >> > > This is not the partial code path, it is the .finup path, in fact. > Anything that hashes data that is often a multiple of the block size > (which is more likely for block based applications than for IPsec, I > think) should benefit from this. But even if it is not, using a head > block and a pointer to the src eliminates one call of the block > transform. > > Note that, in the arm64 case, calling a SHA-256 block transform in > non-process context involves: > - stacking the contents of 28 SIMD registers (28 x 16 = 448 bytes) > - loading the SHA-256 constants (16 x 16 = 256 bytes) > - processing the data > - unstacking the contents of 28 SIMD registers (448 bytes) > > so anything that can prevent needlessly calling these functions > multiple times in quick successsion is going to help, and 'just > calling it twice' just doesn't cut it. > OK, stacking/unstacking can be amortized over multiple invocations of the block transform, only loading the round constants cannot. >>> Do note that these are only used by static inline functions, so the >>> unused arguments are all eliminated from the binary anyway. In fact, >>> looking at the generated code, the function calls don't use function >>> pointers at all anymore, >>> but just call the block transform directly, so the typedef is only >>> used as a prototype, really. >> >> It's not just the generated code. The next guy that comes along >> and writes a SHA implementation is going to go WTH is this p >> argument. I'm not going to add crap to the generic layer just >> because ARM needs it. In fact ARM doesn't even need it. >> > > OK, so there are 2 pieces of crap [sic] in this proposed generic layer: > - the head block > - the generic pointer > > The generic pointer is used in the arm64 case to convey the > information that the current invocation of the block transform is the > final one, and the core code can apply the padding and finalize /and/ > pass back whether it has done so or not. (the latter can easily be > done in the C code as well) I used a generic pointer to allow other > uses, but if you have a better idea for this particular use case, I'd > be happy to hear it. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html