AW: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512

Markus Stockhausen <stockhausen@xxxxxxxxxxx> · Sun, 29 Mar 2015 08:29:39 +0000

> Von: linux-crypto-owner@xxxxxxxxxxxxxxx [linux-crypto-owner@xxxxxxxxxxxxxxx]&quot; im Auftrag von &quot;Ard Biesheuvel [ard.biesheuvel@xxxxxxxxxx]
> Gesendet: Samstag, 28. März 2015 23:10
> An: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx; samitolvanen@xxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx; jussi.kivilinna@xxxxxx
> Cc: Ard Biesheuvel
> Betreff: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512
> 
> To reduce the number of copies of boilerplate code throughout
> the tree, this patch implements generic glue for the SHA-512
> algorithm. This allows a specific arch or hardware implementation
> to only implement the special handling that it needs.

Hi Ard,

Implementing a common layer is a very good idea - I didn't like to 
implement the glue code once again for some recently developed 
PPC crypto modules. From my very short crypto experience I was 
surprised that my optimized implementations degraded disproportional 
for small calculations in the <=256byte update scenarios in contrast to 
some very old basic implementations. Below you will find some hints, 
that might fit your implementation too. Thus all new implementations 
based on your framework could benefit immediately.

> ...
> +int sha384_base_init(struct shash_desc *desc)
> +{
> +       struct sha512_state *sctx = shash_desc_ctx(desc);
> +
> +       *sctx = (struct sha512_state){
> +               .state = {
> +                       SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
> +                       SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7,
> +               }
> +       };
> +       return 0;
> +}

IIRC the above code will initialize the whole context including the 64/128
byte buffer. Direct assignment of the 8 hashes was faster in my case. 

> ...
> +int sha512_base_do_update(struct shash_desc *desc, const u8 *data,
> +                         unsigned int len, sha512_block_fn *block_fn, void *p)
> +{
> +       struct sha512_state *sctx = shash_desc_ctx(desc);
> +       unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE;
> +
> +       sctx->count[0] += len;
> +       if (sctx->count[0] < len)
> +               sctx->count[1]++;

You should check if early kick out at this point if the buffer won't be filled up
is faster than first taking care about big data. That can improve performance
for small blocks while large blocks might be unaffected.

> +
> +       if ((partial + len) >= SHA512_BLOCK_SIZE) {
> +               int blocks;
> +
> +               if (partial) {
> +                       int p = SHA512_BLOCK_SIZE - partial;
> +
> +                       memcpy(sctx->buf + partial, data, p);
> +                       data += p;
> +                       len -= p;
> +               }
> +
> +               blocks = len / SHA512_BLOCK_SIZE;
> +               len %= SHA512_BLOCK_SIZE;
> +
> +               block_fn(blocks, data, sctx->state,
> +                        partial ? sctx->buf : NULL, p);
> +               data += blocks * SHA512_BLOCK_SIZE;
> +               partial = 0;
> +       }
> +       if (len)
> +               memcpy(sctx->buf + partial, data, len);
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL(sha512_base_do_update);
> +
> +int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn *block_fn,
> +                           void *p)
> +{
> +       static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, };
> +
> +       struct sha512_state *sctx = shash_desc_ctx(desc);
> +       unsigned int padlen;
> +       __be64 bits[2];
> +
> +       padlen = SHA512_BLOCK_SIZE -
> +                (sctx->count[0] + sizeof(bits)) % SHA512_BLOCK_SIZE;
> +
> +       bits[0] = cpu_to_be64(sctx->count[1] << 3 |
> +                             sctx->count[0] >> 61);
> +       bits[1] = cpu_to_be64(sctx->count[0] << 3);
> +
> +       sha512_base_do_update(desc, padding, padlen, block_fn, p);

I know that this is the most intuitive and straight implementation for handling
finalization. Nevertheless the maybe a little obscure generic md5 algorithm
gives best in class performance for hash finalization of small input data. 

For comparison: From the raw numbers the sha1-ppc-spe assembler module 
written by me is only 10% faster than the old sha1-popwerpc assembler module. 
Both are simple assembler algorithms without hardware acceleration. For large 
blocks I gain another 8% by avoding function calls because the core module 
may process several blocks. But for small single block updates the above glue 
code optimizations gave

16byte block single update: +24%
64byte block single update: +16%
256byte block single update +12%

Considering CPU assisted SHA calculations that percentage may be even higher.

Maybe worth the effort ... 

Markus
****************************************************************************
Diese E-Mail enthÃ¤lt vertrauliche und/oder rechtlich geschÃ¼tzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtÃ¼mlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Ã?ber das Internet versandte E-Mails kÃ¶nnen unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche WillenserklÃ¤rung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 KÃ¶ln

Vorstand:
Kadir Akin
Dr. Michael HÃ¶hnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht KÃ¶ln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 KÃ¶ln

executive board:
Kadir Akin
Dr. Michael HÃ¶hnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************