Re: [PATCH 05/22] read-cache: add index reading api

Thomas Gummerer <t.gummerer@xxxxxxxxx> · Mon, 08 Jul 2013 22:10:58 +0200

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Thomas Gummerer <t.gummerer@xxxxxxxxx> writes:
>
>> Add an api for access to the index file.  Currently there is only a very
>> basic api for accessing the index file, which only allows a full read of
>> the index, and lets the users of the data filter it.  The new index api
>> gives the users the possibility to use only part of the index and
>> provides functions for iterating over and accessing cache entries.
>>
>> This simplifies future improvements to the in-memory format, as changes
>> will be concentrated on one file, instead of the whole git source code.
>>
>> Signed-off-by: Thomas Gummerer <t.gummerer@xxxxxxxxx>
>> ---
>>  cache.h         |  57 +++++++++++++++++++++++++++++-
>>  read-cache-v2.c |  96 +++++++++++++++++++++++++++++++++++++++++++++++--
>>  read-cache.c    | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>  read-cache.h    |  12 ++++++-
>>  4 files changed, 263 insertions(+), 10 deletions(-)
>>
>> diff --git a/cache.h b/cache.h
>> index 5082b34..d38dfbd 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -127,7 +127,8 @@ struct cache_entry {
>>  	unsigned int ce_flags;
>>  	unsigned int ce_namelen;
>>  	unsigned char sha1[20];
>> -	struct cache_entry *next;
>> +	struct cache_entry *next; /* used by name_hash */
>> +	struct cache_entry *next_ce; /* used to keep a list of cache entries */
>
> The reader often needs to rewind the read-pointer partially while
> walking the index (e.g. next_cache_entry() in unpack-trees.c and how
> the o->cache_bottom position is used throughout the subsystem).  I
> am not sure if this singly-linked list is a good way to go.

I'm not very familiar with the unpack-trees code, but from a quick look
the pointer (or position in the cache) is always only moved forward.  A
problem I do see though is skipping a number of entries at once.  An
example for that below:
			int matches;
			matches = cache_tree_matches_traversal(o->src_index->cache_tree,
							       names, info);
			/*
			 * Everything under the name matches; skip the
			 * entire hierarchy.  diff_index_cached codepath
			 * special cases D/F conflicts in such a way that
			 * it does not do any look-ahead, so this is safe.
			 */
			if (matches) {
				o->cache_bottom += matches;
				return mask;
			}

This could probably be transformed into something like
skip_cache_tree_matches(cache-tree, names, info);

I'll take some time to familiarize myself with the unpack-trees code to
see if I can find a better solution than this, and if there are more
pitfalls.

>> +/*
>> + * Options by which the index should be filtered when read partially.
>> + *
>> + * pathspec: The pathspec which the index entries have to match
>> + * seen: Used to return the seen parameter from match_pathspec()
>> + * max_prefix, max_prefix_len: These variables are set to the longest
>> + *     common prefix, the length of the longest common prefix of the
>> + *     given pathspec
>
> These probably should use "struct pathspec" abstration, not just the
> "array of raw strings", no?

Yes, thanks, that's probably a good idea.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html