Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Duy Nguyen <pclouds@xxxxxxxxx> writes:

> On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes:
>>
>>> This avoids unnecessary re-allocations and reinsertions. On webkit.git
>>> (i.e. about 182k inserts to the name hash table), this reduces about
>>> 100ms out of 3s user time.
>>>
>>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
>>
>> I think this is a very good idea, but I would prefer the second
>> parameter to the "preallocate" to be "expected number of entries"
>> and have the preallocate, which is a part of the hash API, decide
>> how to inflate that number to adjust to the desired load factor of
>> the hash table.  We shouldn't have to adjust the caller when the
>> internal implementation of the hash table changes.
>
> OK will do.

I've squashed it in myself, so no need to resend only for this.

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 8d3d9bb..6c7a72f 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -389,7 +389,7 @@ static int find_exact_renames(struct diff_options *options)
 	struct hash_table file_table;
 
 	init_hash(&file_table);
-	preallocate_hash(&file_table, (rename_src_nr + rename_dst_nr) * 2);
+	preallocate_hash(&file_table, rename_src_nr + rename_dst_nr);
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].p->one);
 
diff --git a/hash.h b/hash.h
index 244d1fe..1d43ac0 100644
--- a/hash.h
+++ b/hash.h
@@ -40,11 +40,11 @@ static inline void init_hash(struct hash_table *table)
 	table->array = NULL;
 }
 
-static inline void preallocate_hash(struct hash_table *table, unsigned int size)
+static inline void preallocate_hash(struct hash_table *table, unsigned int elts)
 {
 	assert(table->size == 0 && table->nr == 0 && table->array == NULL);
-	table->size = size;
-	table->array = xcalloc(sizeof(struct hash_table_entry), size);
+	table->size = elts * 2;
+	table->array = xcalloc(sizeof(struct hash_table_entry), table->size);
 }
 
 #endif
diff --git a/name-hash.c b/name-hash.c
index 90c7b99..2a1f108 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -93,7 +93,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	if (istate->name_hash_initialized)
 		return;
 	if (istate->cache_nr)
-		preallocate_hash(&istate->name_hash, istate->cache_nr * 2);
+		preallocate_hash(&istate->name_hash, istate->cache_nr);
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
 	istate->name_hash_initialized = 1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]