On Fri, Oct 4, 2019 at 4:44 PM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > The round count is passed via the fifth function parameter, so it is > already on the stack. Reloading it for every block doesn't sound like > a huge deal to me. Please benchmark it to indicate that, if it really isn't a big deal. I recall finding that memory accesses on common mips32r2 commodity router hardware was extremely inefficient. The whole thing is designed to minimize memory accesses, which are the primary bottleneck on that platform. Seems like this thing might be best deferred for after this all lands. IOW, let's get this in with the 20 round original now, and later you can submit a change for the 12 round and René and I can spend time dusting off our test rigs and seeing which strategy works best. I very nearly tossed out a bunch of old router hardware last night when cleaning up. Glad I saved it!