Fix memory leak in "git rev-list --objects"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Martin Langhoff points out that "git repack -a" ends up using up a lot of 
memory for big archives, and that git cvsimport probably should do only 
incremental repacks in order to avoid having repacking flush all the 
caches.

The big majority of the memory usage of repacking is from git rev-list 
tracking all objects, and this patch should go a long way in avoiding the 
excessive memory usage: the bulk of it was due to the object names being 
leaked from the tree parser.

For the historic Linux kernel archive, this simple patch does:

Before:
	/usr/bin/time git-rev-list --all --objects > /dev/null 

	72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+125376minor)pagefaults 0swaps

After:
	/usr/bin/time git-rev-list --all --objects > /dev/null 

	75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+43921minor)pagefaults 0swaps

where we do end up wasting a bit of time on some extra strdup()s (which 
could be avoided, but that would require tracking where the pathnames came 
from), but we avoid a lot of memory usage.

Minor page faults track maximum RSS very closely (each page fault maps in 
one page into memory), so the reduction from 125376 page faults to 43921 
means a rough reduction of VM footprint from almost half a gigabyte to 
about a third of that. Those numbers were also double-checked by looking 
at "top" while the process was running.

(Side note: at least part of the remaining VM footprint is the mapping of 
the 177MB pack-file, so the remaining memory use is at least partly "well 
behaved" from a project caching perspective).

For the current git archive itself, the memory usage for a "--all 
--objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to 
9MB), so the reduction seems to hold for much smaller projects too.

For regular "git-rev-list" usage (ie without the "--objects" flag) this 
patch has no impact.

Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
---
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index f11dbd6..5277d3c 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -103,6 +103,7 @@ static struct object_list **process_blob
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return p;
 	obj->flags |= SEEN;
+	name = strdup(name);
 	return add_object(obj, p, path, name);
 }
 
@@ -122,6 +123,7 @@ static struct object_list **process_tree
 	if (parse_tree(tree) < 0)
 		die("bad tree object %s", sha1_to_hex(obj->sha1));
 	obj->flags |= SEEN;
+	name = strdup(name);
 	p = add_object(obj, p, path, name);
 	me.up = path;
 	me.elem = name;
@@ -134,6 +136,7 @@ static struct object_list **process_tree
 			p = process_tree(entry->item.tree, p, &me, entry->name);
 		else
 			p = process_blob(entry->item.blob, p, &me, entry->name);
+		free(entry->name);
 		free(entry);
 		entry = next;
 	}
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]