Hi, David Barr wrote: > Signed-off-by: David Barr <david.barr@xxxxxxxxxxxx> Thanks, this is a welcome change. But perhaps it would be nice to explain why, here? :) E.g., what is stored in the atom table? does it tend to get big? does the existing code allow it to grow? this change will allow it to grow, right? what is the downside to this change (if any)? Especially, numbers (timings) illustrating the effect on typical use and effect on scalability would be interesting. > --- > fast-import.c | 17 ++++++++++------- > 1 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/fast-import.c b/fast-import.c > index 65d65bf..0592b21 100644 > --- a/fast-import.c > +++ b/fast-import.c > @@ -300,9 +300,8 @@ static size_t total_allocd; > static struct mem_pool *mem_pool; > > /* Atom management */ > -static unsigned int atom_table_sz = 4451; > static unsigned int atom_cnt; > -static struct atom_str **atom_table; > +static struct hash_table atom_table; > > /* The .pack file being generated */ > static unsigned int pack_id; > @@ -680,10 +679,11 @@ static struct object_entry *find_mark(uintmax_t idnum) > > static struct atom_str *to_atom(const char *s, unsigned short len) > { > - unsigned int hc = hc_str(s, len) % atom_table_sz; > + unsigned int hc = hc_str(s, len); > struct atom_str *c; > + void **pos; > > - for (c = atom_table[hc]; c; c = c->next_atom) > + for (c = lookup_hash(hc, &atom_table); c; c = c->next_atom) > if (c->str_len == len && !strncmp(s, c->str_dat, len)) > return c; > > @@ -691,8 +691,12 @@ static struct atom_str *to_atom(const char *s, unsigned short len) > c->str_len = len; > strncpy(c->str_dat, s, len); > c->str_dat[len] = 0; > - c->next_atom = atom_table[hc]; > - atom_table[hc] = c; > + c->next_atom = NULL; > + pos = insert_hash(hc, c, &atom_table); > + if (pos) { > + c->next_atom = *pos; > + *pos = c; > + } If I understand correctly, this puts new atoms at the start of the chain, just like v1.7.4-rc0~40^2 (fast-import: insert new object entries at start of hash bucket, 2010-11-23) did for objects. Did you measure and find this faster, or is it just for simplicity or consistency? (I'd personally be fine with it either way, but it seems prudent to ask.) > atom_cnt++; > return c; > } > @@ -3263,7 +3267,6 @@ int main(int argc, const char **argv) > > alloc_objects(object_entry_alloc); > strbuf_init(&command_buf, 0); > - atom_table = xcalloc(atom_table_sz, sizeof(struct atom_str*)); > branch_table = xcalloc(branch_table_sz, sizeof(struct branch*)); > avail_tree_table = xcalloc(avail_tree_table_sz, sizeof(struct avail_tree_content*)); > marks = pool_calloc(1, sizeof(struct mark_set)); We never call init_hash. That's technically safe because init_hash just zeroes out the table, but I think I'd rather see us using it anyway or documenting in api-hash.txt that it's safe not to use. Looks good. Will queue to give it some testing. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html