On Fri, Sep 25, 2020 at 02:05:09AM -0700, Carlo Arenas wrote: > > > [stock] > > > Benchmark #1: t/helper/test-tool sha1 <foo.rand > > > Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s] > > > Range (min … max): 6.550 s … 6.841 s 10 runs > > slightly offtopic but what generates this nicely formatted output? It's this: https://github.com/sharkdp/hyperfine It will actually run both versions and compare them, but it's a little more involved to set up (since you have to do a build step in between). > > I cannot speak for s390, since I have never owned one > > I happen to be lucky enough to have access to one (RHEL 8.2/z15, gcc > 8.3.1) and seems (third consecutive run): > > stock: user: 7.555s, system: 1.191s > -DNO_UNALIGNED_LOADS: user: 7.561s, system: 1.189s Thanks. That's not too surprising. gcc 8 seems to be able to optimize both versions to the same thing (though I have no idea if s390 has a bswap instruction). -Peff