On 8/21/2018 11:36 PM, Jeff King wrote:
On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
with the obvious "oideq()" implementation added, that seems to get me to
2-3%. Not _quite_ as good as the original branching version I showed.
And we had to touch all the callsites (although arguably that kind of
"eq" function is a better interface anyway, since it obviously allows
for more optimization.
So maybe the branching thing is actually not so insane. It makes new
hash_algo's Just Work; they just won't be optimized. And the change is
very localized.
Hmph. So I went back to double-check my measurements on that branching
version, and I couldn't replicate it!
I'm actually relieved to see this, as I couldn't either.
It turns out what I showed (and measured) before has a bug. Can you see
it?
I had rewritten the section from scratch instead of applying your diff,
so I didn't get the sha1-sha1 error. I decided to sleep on it instead of
sending my email.
So the assert() version really is the fastest. I didn't test, but I
suspect we could "trick" the compiler by having the fallback call an
opaque wrapper around memcmp(). That would prevent it from combining the
two paths, and presumably it would still optimize the constant-20 side.
Or maybe it would eventually decide our inline function is getting too
big and scrap it. Which probably crosses a line of craziness (if I
didn't already cross it two emails ago).
I appreciate your effort here.
Thanks
-Stolee