From: David Barr <david.barr@xxxxxxxxxxxx> More often than not, find_object is called for recently inserted objects. Optimise for this case by inserting new entries at the start of the chain. This doesn't affect the cost of new inserts but reduces the cost of find and insert for existing object entries. Signed-off-by: David Barr <david.barr@xxxxxxxxxxxx> Signed-off-by: Jonathan Nieder <jrnieder@xxxxxxxxx> --- Hi, After importing the first 500000 revs or so of the ASF repo with svn-fe, it seems that fast-import gets a bit sluggish because its object table fills up. David noticed that one can get a bit of a speed-up by noticing that objects are likely to be accessed soon after they are inserted in this workflow. (The basis for most svn deltas comes from the revision just dumped.) I would guess other workflows would display the same tendency if any; at least, the same blob is likely to come up repeatedly in a few revs and then not be mentioned for a while. Though maybe there is some reason to make the opposite assumption that I have not thought of. Shawn? A more dramatic improvement comes from increasing the size of the hash table; in particular, David noticed it allows the CPU usage to increase from ~60% to 100% of one core. So presumably we should make the size of the hash table dynamic. Other aspects to investigate: choice of hash function; is it worth moving accessed entries to the front of hash chains when they already exist? Timings in interesting workflows would also be nice. Regardless, this seems like a good change on its own, especially because it simplifies the code. Thoughts? Jonathan fast-import.c | 9 ++------- 1 files changed, 2 insertions(+), 7 deletions(-) diff --git a/fast-import.c b/fast-import.c index 77549eb..e2f4874 100644 --- a/fast-import.c +++ b/fast-import.c @@ -539,22 +539,17 @@ static struct object_entry *insert_object(unsigned char *sha1) { unsigned int h = sha1[0] << 8 | sha1[1]; struct object_entry *e = object_table[h]; - struct object_entry *p = NULL; while (e) { if (!hashcmp(sha1, e->idx.sha1)) return e; - p = e; e = e->next; } e = new_object(sha1); - e->next = NULL; + e->next = object_table[h]; e->idx.offset = 0; - if (p) - p->next = e; - else - object_table[h] = e; + object_table[h] = e; return e; } -- 1.7.2.3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html