Re: {RFC/PATCH] micro-optimize get_sha1_hex()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 9 Sep 2006, Junio C Hamano wrote:
>
> I was profiling 'git-rev-list v2.16.12..', because I suspected
> insert_by_date() might be expensive (the function inserts into
> singly-linked ordered list, so the data structure has to become
> array based to allow optimization).

I _suspect_ that you profiled using "gprof" and the "-pg" flag to 
the compiler?

The thing is, that generates almost totally bogus profiles. It makes 
simple function calls look extremely - and unrealistically - expensive, 
because the profiling code itself adds lots of overhead, and it then ends 
up showing on the tick-based profile too in a way that it wouldn't show 
in real life.

It tends to be much better to profile unmodified binaries using 
"oprofile", which doesn't end up giving the exact number of calls, but 
where the time-based profiling is a _lot_ more reliable.

That said, clearly "hexval()" can be optimized, and it's quite possible 
that it would show up even in oprofile data too.

> The attached brings get_sha1_hex() down from 15.19% to 5.41%,
> but I feel we should be able to do better.

I doubt it's 5% in real life, but your optimization looks fine to me.

You can make it even _more_ optimized by not even bothering to test at 
each point, but just inlining hexval(), and making it a single array 
lookup. The point being that if you get -1 for _either_ of the two 
lookups, since we shift them and or them together, the end result will be 
invalid, and you only need to _test_ once.

So this patch should generate much better code, where the inner loop is 
something like

<get_sha1_hex+16>:
	add    $0x1,%esi		# count of result bytes..
	cmp    $0x14,%esi		# did we have 20 already?
	mov    %dl,(%ebx)		# save the "last loop" result
	je     0x807630b <get_sha1_hex+75>	# break out
	add    $0x1,%ebx		# update dest pointer (one byte at a time)
	add    $0x2,%ecx		# update source pointer (two hex chars at a time)
	movsbl (%ecx),%eax		# get first character
	movsbl 0x80989e0(%eax),%edx	# get the hexval for it
	movsbl 0x1(%ecx),%eax		# get the second character
	shl    $0x4,%edx		# shift first character hexval up by 4
	movsbl 0x80989e0(%eax),%eax	# get second character hexval
	or     %eax,%edx		# get the byte value
	test   $0xffffff00,%edx		# was it ok (0x00 - 0xff)?
	je     <get_sha1_hex+16>	# yeah, continue

which should perform very well (just two comparisons - one for the "wrong 
bits set" at the end, and one for the 20-byte-counter test at the 
beginning). Small and sweet.

In fact, I don't see how you can much improve on that code even if you 
were to hand-code it, so gcc actually generates good code once the 
"hexval()" function has been written like this.

Not very much tested, but it looks obvious enough.. And it's actually 
conceptually a smaller diff than yours (ie it doesn't change anything but 
the "hexval()" function, and all the bulk of it is the stupid array 
initializer).

		Linus

---
diff --git a/sha1_file.c b/sha1_file.c
index 428d791..b64b92d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -26,15 +26,43 @@ const unsigned char null_sha1[20];
 
 static unsigned int sha1_file_open_flag = O_NOATIME;
 
-static unsigned hexval(char c)
-{
-	if (c >= '0' && c <= '9')
-		return c - '0';
-	if (c >= 'a' && c <= 'f')
-		return c - 'a' + 10;
-	if (c >= 'A' && c <= 'F')
-		return c - 'A' + 10;
-	return ~0;
+static inline unsigned int hexval(unsigned int c)
+{
+	static signed char val[256] = {
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 00-07 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 08-0f */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 10-17 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 18-1f */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 20-27 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 28-2f */
+		  0,  1,  2,  3,  4,  5,  6,  7,		/* 30-37 */
+		  8,  9, -1, -1, -1, -1, -1, -1,		/* 38-3f */
+		 -1, 10, 11, 12, 13, 14, 15, -1,		/* 40-47 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 48-4f */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 50-57 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 58-5f */
+		 -1, 10, 11, 12, 13, 14, 15, -1,		/* 60-67 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 68-67 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 70-77 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 78-7f */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 80-87 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 88-8f */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 90-97 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* 98-9f */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* a0-a7 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* a8-af */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* b0-b7 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* b8-bf */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* c0-c7 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* c8-cf */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* d0-d7 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* d8-df */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* e0-e7 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* e8-ef */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* f0-f7 */
+		 -1, -1, -1, -1, -1, -1, -1, -1,		/* f8-ff */
+	};
+	return val[c];
 }
 
 int get_sha1_hex(const char *hex, unsigned char *sha1)
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]