[PATCH/RFC] fast-import: insert new object entries at start of hash bucket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: David Barr <david.barr@xxxxxxxxxxxx>

More often than not, find_object is called for recently inserted objects.
Optimise for this case by inserting new entries at the start of the chain.
This doesn't affect the cost of new inserts but reduces the cost of find
and insert for existing object entries.

Signed-off-by: David Barr <david.barr@xxxxxxxxxxxx>
Signed-off-by: Jonathan Nieder <jrnieder@xxxxxxxxx>
---
Hi,

After importing the first 500000 revs or so of the ASF repo with
svn-fe, it seems that fast-import gets a bit sluggish because its
object table fills up.  David noticed that one can get a bit of a
speed-up by noticing that objects are likely to be accessed soon after
they are inserted in this workflow.  (The basis for most svn deltas
comes from the revision just dumped.)

I would guess other workflows would display the same tendency if any;
at least, the same blob is likely to come up repeatedly in a few revs
and then not be mentioned for a while.  Though maybe there is some
reason to make the opposite assumption that I have not thought of.
Shawn?

A more dramatic improvement comes from increasing the size of the hash
table; in particular, David noticed it allows the CPU usage to
increase from ~60% to 100% of one core.  So presumably we should make
the size of the hash table dynamic.

Other aspects to investigate: choice of hash function; is it worth
moving accessed entries to the front of hash chains when they already
exist?

Timings in interesting workflows would also be nice.

Regardless, this seems like a good change on its own, especially
because it simplifies the code.

Thoughts?
Jonathan

 fast-import.c |    9 ++-------
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 77549eb..e2f4874 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -539,22 +539,17 @@ static struct object_entry *insert_object(unsigned char *sha1)
 {
 	unsigned int h = sha1[0] << 8 | sha1[1];
 	struct object_entry *e = object_table[h];
-	struct object_entry *p = NULL;
 
 	while (e) {
 		if (!hashcmp(sha1, e->idx.sha1))
 			return e;
-		p = e;
 		e = e->next;
 	}
 
 	e = new_object(sha1);
-	e->next = NULL;
+	e->next = object_table[h];
 	e->idx.offset = 0;
-	if (p)
-		p->next = e;
-	else
-		object_table[h] = e;
+	object_table[h] = e;
 	return e;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]