On 04/28/2011 12:35 AM, Ingo Molnar wrote:
: while ((obj = obj_hash[i]) != NULL) {
4.13 : 498316: eb 1f jmp 498337<lookup_object+0x47>
0.00 : 498318: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
0.00 : 49831f: 00
: if (!hashcmp(sha1, obj->sha1))
1.48 : 498320: 48 8d 78 04 lea 0x4(%rax),%rdi
0.02 : 498324: 4c 89 d6 mov %r10,%rsi
0.00 : 498327: 4c 89 d9 mov %r11,%rcx
26.12 : 49832a: f3 a6 repz cmpsb %es:(%rdi),%ds:(%rsi)
17.12 : 49832c: 74 14 je 498342<lookup_object+0x52>
: break;
rep cmps can be very slow on some machines, and in particular, rep cmpsb
is optimized for really small strings (the tail of a larger rep cmps[lq]
run).
I think that if you'll replace hashcmp() by something like
static inline bool hashcmp(const unsigned char *sha1, const unsigned
char *sha2)
{
unsigned long cmp;
cmp = *(uint64_t *)sha1 ^ *(unint64_t *)sha2;
cmp |= *(uint64_t *)(sha1 + 8) ^ *(unint64_t *)(sha2 + 8);
cmp |= *(uint32_t *)(sha1 + 16) ^ *(unint32_t *)(sha2 + 16);
return cmp == 0;
}
You'll see much better results.
Of course this only works in general if the hashes are aligned.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html