Re: git and larger trees, not so fast?

David Kastrup <dak@xxxxxxx> · Thu, 09 Aug 2007 20:06:40 +0200

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Thu, 9 Aug 2007, moe wrote:
>> 
>> i made some tests on latest master branch
>> (1.5.3.rc4.29.g74276) and it seems like git
>> hits a wall somewhere above ~50k files.
>
> Good catch. Definitely not acceptable performance.
>
> We seem to spend a lot of our time in memcpy:

[...]

> Doing an ltrace on it shows tons and tons of:
>
> 	...
> 	strlen("35")
> 	strlen("349")
> 	calloc(1, 72)
> 	memcpy(0x73034e, "10/", 3)
> 	memcpy(0x730351, "349", 4)
> 	memmove(0x2ab637f41e80, 0x2ab637f41e78, 781768)
> 	...
>
> but I haven't looked at where they come from yet.

Ok, preaching to the pope here, but: moving memory is bad.  Make sure
data can stay where it starts.  In particular: don't use realloc.  And
if you do, grow the size _exponentially_ (like a factor of 1.5).  If
you grow the size exponentially, at least the movery hits the
algorithm with O(n lg n).  If data stays put, even in badly scattered
linear lists, we have O(n).  If you grow realloc _linearly_ (constant
size increment), then the algorithm is hit with O(n^2).

Technically, basically _all_ operations of git can be done by doing
list merges of presorted lists.  The index can be kept sorted.  All
requests into the index can be collected in readdir order into one
linear list, this gets sorted once using a merge sort with O(n lg n),
then it gets merged with the index O(n+N).  As long as the whole index
is read in, it can't be done faster.  It is not necessary to organize
the read data into trees or more complicate structures: a single
linear list is sufficient.  One can use the hierarchical structure of
a directory to shave off some part of the sorting cost, though, and it
considerably will lessen memory impact (and copying costs) if a
file/directory/tree entry can contain a relative file name and a
"pointer to prefix" where the rest of the file path is to be found.

Anyway, so much for some theory.  Now let's look at bad points in the
code, judging from your benchmarks.

A grep for realloc is appaling.  Let's see what is actually involved
here.

attr.c:

struct git_attr *git_attr(const char *name, int len)
{

	a->attr_nr = attr_nr++;
	git_attr_hash[pos] = a;

	check_all_attr = xrealloc(check_all_attr,
				  sizeof(*check_all_attr) * attr_nr);

[...]

Full O(n^2) behavior of the worst kind (increment 1!).

builtin-commit-tree.c:

static void add_buffer(char **bufp, unsigned int *sizep, const char *fmt, ...)
{
	size = *sizep;
	newsize = size + len + 1;
	alloc = (size + 32767) & ~32767;

[size rounded to next 32k (inconsistent! needs to be size+1 rounded up)]

	buf = *bufp;
	if (newsize > alloc) {
		alloc = (newsize + 32767) & ~32767;

[newsize rounded to next 32k]

		buf = xrealloc(buf, alloc);

[O(n^2): constant increment.  Important?  No idea.]

		*bufp = buf;
	}
	*sizep = newsize - 1;

	memcpy(buf + size, one_line, len);
}

[...]

int register_commit_graft(struct commit_graft *graft, int ignore_dups)
{

[...]
	if (commit_graft_alloc <= ++commit_graft_nr) {
		commit_graft_alloc = alloc_nr(commit_graft_alloc);
		commit_graft = xrealloc(commit_graft,
					sizeof(*commit_graft) *
					commit_graft_alloc);
	}
	if (pos < commit_graft_nr)
		memmove(commit_graft + pos + 1,
			commit_graft + pos,
			(commit_graft_nr - pos - 1) *
			sizeof(*commit_graft));
	commit_graft[pos] = graft;
	return 0;
}

Eeek.  Start with a linear list, not an array.

objects.c:

void add_object_array_with_mode(struct object *obj, const char *name, struct object_array *array, unsigned mode)
{
	unsigned nr = array->nr;
	unsigned alloc = array->alloc;
	struct object_array_entry *objects = array->objects;

	if (nr >= alloc) {
		alloc = (alloc + 32) * 2;
		objects = xrealloc(objects, alloc * sizeof(*objects));
		array->alloc = alloc;

Constant increment, O(n^2).

pathlist.c:

static int add_entry(struct path_list *list, const char *path)
{
	int exact_match;
	int index = get_entry_index(list, path, &exact_match);

	if (exact_match)
		return -1 - index;

	if (list->nr + 1 >= list->alloc) {
		list->alloc += 32;
		list->items = xrealloc(list->items, list->alloc
				* sizeof(struct path_list_item));
	}

Constant increment, O(n^2).

That's just a cursory examination.  In my opinion, pretty much every
realloc should be replaced by some sort of list structure.  That would
be the nicest thing.  I have sped up some awk scripts that built up
argument lists by a factor of 100 by replacing
a[index] = a[index] " " thenewstuff
with
a[index,nr[index]++] = thenewstuff
and then never concatenating the strings, but just outputting them in
a loop.

Anyway, short of that, don't realloc by fixed increments, always use
alloc_nr as soon as multiple reallocs are to be expected.

And they certainly are in some of the above cited code passages.

-- 
David Kastrup

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html