Junio C Hamano <gitster@xxxxxxxxx> writes: > Thomas Gummerer <t.gummerer@xxxxxxxxx> writes: > >> Add an api for access to the index file. Currently there is only a very >> basic api for accessing the index file, which only allows a full read of >> the index, and lets the users of the data filter it. The new index api >> gives the users the possibility to use only part of the index and >> provides functions for iterating over and accessing cache entries. >> >> This simplifies future improvements to the in-memory format, as changes >> will be concentrated on one file, instead of the whole git source code. >> >> Signed-off-by: Thomas Gummerer <t.gummerer@xxxxxxxxx> >> --- >> cache.h | 57 +++++++++++++++++++++++++++++- >> read-cache-v2.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++-- >> read-cache.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++++---- >> read-cache.h | 12 ++++++- >> 4 files changed, 263 insertions(+), 10 deletions(-) >> >> diff --git a/cache.h b/cache.h >> index 5082b34..d38dfbd 100644 >> --- a/cache.h >> +++ b/cache.h >> @@ -127,7 +127,8 @@ struct cache_entry { >> unsigned int ce_flags; >> unsigned int ce_namelen; >> unsigned char sha1[20]; >> - struct cache_entry *next; >> + struct cache_entry *next; /* used by name_hash */ >> + struct cache_entry *next_ce; /* used to keep a list of cache entries */ > > The reader often needs to rewind the read-pointer partially while > walking the index (e.g. next_cache_entry() in unpack-trees.c and how > the o->cache_bottom position is used throughout the subsystem). I > am not sure if this singly-linked list is a good way to go. I'm not very familiar with the unpack-trees code, but from a quick look the pointer (or position in the cache) is always only moved forward. A problem I do see though is skipping a number of entries at once. An example for that below: int matches; matches = cache_tree_matches_traversal(o->src_index->cache_tree, names, info); /* * Everything under the name matches; skip the * entire hierarchy. diff_index_cached codepath * special cases D/F conflicts in such a way that * it does not do any look-ahead, so this is safe. */ if (matches) { o->cache_bottom += matches; return mask; } This could probably be transformed into something like skip_cache_tree_matches(cache-tree, names, info); I'll take some time to familiarize myself with the unpack-trees code to see if I can find a better solution than this, and if there are more pitfalls. >> +/* >> + * Options by which the index should be filtered when read partially. >> + * >> + * pathspec: The pathspec which the index entries have to match >> + * seen: Used to return the seen parameter from match_pathspec() >> + * max_prefix, max_prefix_len: These variables are set to the longest >> + * common prefix, the length of the longest common prefix of the >> + * given pathspec > > These probably should use "struct pathspec" abstration, not just the > "array of raw strings", no? Yes, thanks, that's probably a good idea. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html