[PATCH/RFC] Make the largest pack the first one in the pack search queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When git looks for an object in a pack, it has to go through all the
pack indexes. Pack order becomes important because if the pack is
found early, the search stops early. Git sorts packs by mostly by
mtime, favoring recent packs.

This strategy however puts the main pack at the bottom because the
main pack is usually created at clone time and occasional "repack -ad",
so likely the oldest. Though being the largest pack, this pack is
likely hit (even latest commits have unchanged parts, which is likely
in the main pack).

This patch changes pack order a little bit, putting the main pack on
top, the rest is ordered as usual. When we look for latest changes,
it'll likely miss the main pack. But after the first miss,
"last_found_pack" optimization kicks in, thus reduce the miss cost.

I modified find_pack_entry() to record pack misses, then do a few
commonly used commands like "git diff", "git diff --cached", "git diff
HEAD", "git status", "git branch -v", "git log"... Pack miss decreased
in all cases. So I think this might be a good heuristic. But this is
micro-optmization and probably does not give us any significant
performance gain.

A probably better way to improve too-many-packs situation is create
meta-index in memory for small packs.

An implementation note, packed_git->num_objects should be used to
determine the largest pack, but that field may not be available until
open_packed_git() is called. pack_size should be a good enough
approximation.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
---
 sha1_file.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 798cd46..05ece2d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1072,6 +1072,24 @@ static int sort_pack(const void *a_, const void *b_)
 	return -1;
 }
 
+static int sort_pack_by_size(const void *a_, const void *b_)
+{
+	struct packed_git *a = *((struct packed_git **)a_);
+	struct packed_git *b = *((struct packed_git **)b_);
+	int st;
+
+	/*
+	 * Local packs tend to contain objects specific to our
+	 * variant of the project than remote ones.  In addition,
+	 * remote ones could be on a network mounted filesystem.
+	 * Favor local ones for these reasons.
+	 */
+	st = a->pack_local - b->pack_local;
+	if (st)
+		return -st;
+	return b->pack_size - a->pack_size;
+}
+
 static void rearrange_packed_git(void)
 {
 	struct packed_git **ary, *p;
@@ -1087,7 +1105,8 @@ static void rearrange_packed_git(void)
 	for (n = 0, p = packed_git; p; p = p->next)
 		ary[n++] = p;
 
-	qsort(ary, n, sizeof(struct packed_git *), sort_pack);
+	qsort(ary, n, sizeof(struct packed_git *), sort_pack_by_size);
+	qsort(ary + 1, n - 1, sizeof(struct packed_git *), sort_pack);
 
 	/* link them back again */
 	for (i = 0; i < n - 1; i++)
-- 
1.7.3.1.256.g2539c.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]