On Fri, 4 Oct 2019 at 16:59, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > On Fri, Oct 4, 2019 at 4:44 PM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > The round count is passed via the fifth function parameter, so it is > > already on the stack. Reloading it for every block doesn't sound like > > a huge deal to me. > > Please benchmark it to indicate that, if it really isn't a big deal. I > recall finding that memory accesses on common mips32r2 commodity > router hardware was extremely inefficient. The whole thing is designed > to minimize memory accesses, which are the primary bottleneck on that > platform. > Reloading a single word from the stack each time we load, xor and store 64 bytes of data from/to memory is highly unlikely to be noticeable. > Seems like this thing might be best deferred for after this all lands. > IOW, let's get this in with the 20 round original now, and later you > can submit a change for the 12 round and René and I can spend time > dusting off our test rigs and seeing which strategy works best. I very > nearly tossed out a bunch of old router hardware last night when > cleaning up. Glad I saved it! I don't agree but I don't care deeply enough to argue about it :-)