Re: I'm a total push-over..

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 22 Jan 2008 18:58:53 -0800 (PST)

On Tue, 22 Jan 2008, Junio C Hamano wrote:
> 
>  - You might want to store the hash table (once computed) in the
>    index extension section, and lazily unpack the table the
>    first time index_name_exists() or set_index_entry() is called
>    on the given istate, instead of unpacking it immediately when
>    you read from the disk.  That way, ls-files does not have to
>    suffer at all.

I really hate that. 

Basically, I dislike having two copies of the same data. If something can 
be computed from something else, then only the original data should exist, 
and the other thing should be recomputed. Otherwise you easily get into 
situations where you spend a lot of time maintaining the other copy, or 
worse, you have inconsistent data and it's really subtle what is going on.

Also, one of the ideas behind the index is that would depend on your 
notion of what is "equivalent", which is actually somehing fairly fluid. 
In fact, it's likely going to depend on a config option.

So encoding the indexing on disk, when it can change when you do a simple 
"git config", or even just depending on your LANG environment variable, 
seems like a singularly bad idea, even if it wasn't for the coherence.

I did consider doing the indexing only on demand, and we can certainly 
simply just "turn it off" when we know it's never going to get used (ie 
"git ls-files"). So in that sense, it's easy to get rid of the overhead, 
but it didn't really seem like the conceptual complexity (even if it's 
just a couple of lines) is really worth it. It's not like git ls-files is 
really performance-critical anyway.

>  - You would need to get rid of the table in discard_index().

Now this, of course, is obviously true.

And the patch to do that is very simple too. No need to walk any chains, 
since the "free(istate->alloc);" will release all the pre-allocated 
cache_entry structures, and the rest are (necessarily) leaked anyway.

[ Side note for non-Junios: the leaking of cache_entry structures isn't 
  new, we've always done it, and it's even done on purpose. The common 
  case is that there is one *big* allocation (istate->alloc) that contains 
  all the original cache entries.

  There are usually none, or only a very few individual allocations, and 
  we don't even keep track of them. With the new in-memory format, we 
  could make a special flag that does "is this cache-entry an individual 
  allocation or not" (or we could even just see if they are inside the 
  "alloc" range), but the common case really should be that there's just a 
  couple of them, and we just drop them rather than tracking them. ]

Here.

		Linus

---
 read-cache.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 33a8ca5..abee0fc 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1142,6 +1142,7 @@ int discard_index(struct index_state *istate)
 	istate->cache_nr = 0;
 	istate->cache_changed = 0;
 	istate->timestamp = 0;
+	free_hash(&istate->name_hash);
 	cache_tree_free(&(istate->cache_tree));
 	free(istate->alloc);
 	istate->alloc = NULL;
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html