Re: [PATCH 2/2] Implement a simple delta_base cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 17 Mar 2007, Linus Torvalds wrote:
> 
> Instead of always re-generating the delta bases (possibly over and over 
> and over again), just cache the last few ones. They often can get re-used.

Not just to compare actual timings, this shows the difference in the 
traces I did. Remember, before we had:

	[torvalds@woody linux]$ grep Needs delta-base-trace | wc -l
	469334
	[torvalds@woody linux]$ grep Needs delta-base-trace |sort -u | wc -l
	21933

and now with the simple cache, I get:

	[torvalds@woody linux]$ grep Needs delta-base-trace-new | wc -l
	28688
	[torvalds@woody linux]$ grep Needs delta-base-trace-new | sort -u | wc -l
	21933

ie, we still re-generate some of the objects multiple times, but now, 
rather than generating them (on average) 20+ times each, we now generate 
them an average of just 1.3 times each. Which explains why the wall-time 
goes down by over a factor of two.

Changing the (statically sized) cache from 256 entries to 1024 (and 
updating the hash function appropriately of course) gets the number down 
to 23953 delta-base lookups (the number of unique ones obviously stays the 
same), for an average of just 1.1 object generates per unique object, and 
also means that you occasionally get sub-second times for my test-case of 
logging drivers/usb/.

It all also means that libz isn't really even the top entry in the 
profiles any more, although it's still pretty high. But the profile now 
says:

	samples  %        app name                 symbol name
	41527    15.6550  git                      strlen
	30215    11.3905  git                      inflate
	27504    10.3685  git                      inflate_table
	20321     7.6607  git                      find_pack_entry_one
	16892     6.3680  git                      interesting
	16259     6.1294  vmlinux                  __copy_user_nocache
	16010     6.0355  git                      inflate_fast
	9240      3.4833  git                      get_mode
	8863      3.3412  git                      tree_entry_extract
	7145      2.6935  git                      strncmp
	7131      2.6883  git                      memcpy
	6863      2.5872  git                      diff_tree
	6113      2.3045  git                      adler32
	4515      1.7021  git                      _int_malloc
	3022      1.1392  git                      update_tree_entry
	...

(Adding up all of libz is still ~31%, but it's lower as a percentage *and* 
it's obviously a smaller percentage of a much lower absolute time, so the 
zlib overhead went down much more than any other git overheads did)

In general, this all seems very cool. The patches are simple enough that I 
think this is very safe to merge indeed: the only question I have is that 
somebody should verify that the "struct packed_git *p" is stable over the 
whole lifetime of a process - so that we can use it as a hash key without 
having to invalidate hashes if we unmap a pack (I *think* we just unmap 
the virtual mapping, and "struct packed_git *" stays valid, but Junio 
should ack that for me).

Here's the trivial patch to extend the caching to 1k entries if somebody 
cares. I don't know if the small added performance is worth it.

		Linus
---
diff --git a/sha1_file.c b/sha1_file.c
index a7e3a2a..372af60 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1352,7 +1352,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	return buffer;
 }
 
-#define MAX_DELTA_CACHE (256)
+#define MAX_DELTA_CACHE (1024)
 
 static struct delta_base_cache_entry {
 	struct packed_git *p;
@@ -1367,8 +1367,8 @@ static unsigned long pack_entry_hash(struct packed_git *p, off_t base_offset)
 	unsigned long hash;
 
 	hash = (unsigned long)p + (unsigned long)base_offset;
-	hash += (hash >> 8) + (hash >> 16);
-	return hash & 0xff;
+	hash += (hash >> 10) + (hash >> 20);
+	return hash & (MAX_DELTA_CACHE-1);
 }
 
 static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]