Re: [PATCH 3/5] name-hash: precompute hash values during preload-index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff Hostetler <Jeff.Hostetler@xxxxxxxxxxxxx> writes:

> I looked at doing this, but I didn't think the complexity and overhead to
> forward search for peers at the current level didn't warrant the limited gains.

It seems that I wasn't clear what I meant.  I didn't mean anything
complex like what you said.

Just something simple, like this on top of yours, that passes and
compares with only the previous one.  I do not know if that gives
any gain, though ;-).

 cache.h         |  2 +-
 name-hash.c     | 11 +++++++++--
 preload-index.c |  4 +++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 390aa803df..bd2980f6e3 100644
--- a/cache.h
+++ b/cache.h
@@ -233,7 +233,7 @@ struct cache_entry {
 #error "CE_EXTENDED_FLAGS out of range"
 #endif
 
-void precompute_istate_hashes(struct cache_entry *ce);
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev);
 
 /* Forward structure decls */
 struct pathspec;
diff --git a/name-hash.c b/name-hash.c
index f95054f44c..5e09b79170 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -300,7 +300,7 @@ void free_name_hash(struct index_state *istate)
  * non-skip-worktree items (since status should not observe skipped items), but
  * because lazy_init_name_hash() hashes everything, we force it here.
  */
-void precompute_istate_hashes(struct cache_entry *ce)
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev)
 {
 	int namelen = ce_namelen(ce);
 
@@ -312,7 +312,14 @@ void precompute_istate_hashes(struct cache_entry *ce)
 		ce->precomputed_hash.root_entry = 1;
 	} else {
 		namelen--;
-		ce->precomputed_hash.dir = memihash(ce->name, namelen);
+
+		if (prev && 
+		    prev->precomputed_hash.initialized &&
+		    namelen <= ce_namelen(prev) &&
+		    !memcmp(ce->name, prev->name, namelen))
+			ce->precomputed_hash.dir = prev->precomputed_hash.dir;
+		else
+			ce->precomputed_hash.dir = memihash(ce->name, namelen);
 		ce->precomputed_hash.name = memihash_continue(
 			ce->precomputed_hash.dir, ce->name + namelen,
 			ce_namelen(ce) - namelen);
diff --git a/preload-index.c b/preload-index.c
index 602737f9d0..784378ffac 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -37,6 +37,7 @@ static void *preload_thread(void *_data)
 	struct thread_data *p = _data;
 	struct index_state *index = p->index;
 	struct cache_entry **cep = index->cache + p->offset;
+	struct cache_entry *previous = NULL;
 	struct cache_def cache = CACHE_DEF_INIT;
 
 	nr = p->nr;
@@ -47,7 +48,8 @@ static void *preload_thread(void *_data)
 		struct cache_entry *ce = *cep++;
 		struct stat st;
 
-		precompute_istate_hashes(ce);
+		precompute_istate_hashes(ce, previous);
+		previous = ce;
 
 		if (ce_stage(ce))
 			continue;




> (I was just looking at the complexity of clear_ce_flags_1() in unpack-trees.c
> and how hard it has to look to find the end of the current directory and the
> effect that that has on the recursion and it felt like too much work for the
> potential gain.)
>
> Whereas remembering the previous one was basically free.  Granted, it only
> helps us for adjacent files in the index, so it's not perfect, but gives us the
> best bang for the buck.
>
> Jeff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]