On Wed, Aug 8, 2012 at 8:38 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > If the workload we _care_ about is served better by using an API > that works over an in-core tree-shaped index data structure, I do > not think it is unreasonable to read the v2 on-disk format and > represent it as a tree-shaped index while we read it. Of course, > there are things that are not as effective when reading from the > flat v2 on-disk format (e.g. path limited reading will have to at > least _scan_ the whole thing, even though it may process only the > entries that match the pathspec) compared to reading from a > tree-shaped on-disk format, but I doubt that the difference between > the cost of reading into a flat array and the cost of reading and > forming whatever non-flat data structure you seem to think is better > is so big that it would negate the benefit of using a better in-core > structure. OK how about this. The general idea is preserve/extend current flat index API and add a new (tree-based) one. Index users can use either. They can even mix them up (which they do because we can't just flip the API in one day for about 200 source files). The day that unpack_trees() is converted to tree API, I will declare v5 victory ;) = Cleanup = struct cache_entry becomes partly opaque. ce_ctime..ce_gid are hidden in -v2.c and -v5.c. We only expose ce_size, ce_flags, ce_namelen, name and sha1 to index users. Extra v5 fields like ce_stat_crc, next and dir_next are also hidden. These fields can be put in a real struct in read-cache.h, which is supposedly included by -v2.c and -v5.c = Updating = All index update API (add_index_entry, add_to_index, remove_index_entry_at, remove_marked_cached_entries) are hooked by v5 when the loaded index is v5. v5 can update internal data when these are called (e.g. conflict resolution), or just mark them "dirty" to be worked on later in flush_index(). Anybody who updates a cache_entry is supposed to call cache_entry_updated() function, which is no-op for v2 but v5 may want to watch this activity. Refreshing index is a special operation. Of course it's hooked by v5. v5 may need its own implementation because it could walk working tree and index tree at the same time. Of course v5 impl must also update flat API data structure along the way. A new function flush_index() is introduced, where v5 can update all internal data and keep it in sync with index_state. When flat/tree APIs are mixed, flush_index() must be called when switching from flat API to tree API. To help v5 deal with index rewrite in unpack_trees(), index_bulk_update() may be introduced, which tells v5 "we are going to do a lot of adding/removing/shuffling, keep your actions to minimum, you most likely have to rebuild the trees at flush_index() anyway" New API may be introduced for some big operations if it proves v5-beneficial. I'm thinking of adding/removing a bunch of files by pathspec, where v5 can walk working directory at the same time it walks index directory tables. = Tree traversal = I don't see big problems here. We support opendir/readdir-like API for tree traversing (with pathspec filtering). We also support lookup_cache_entry to get cache_entry* of a certain path. When tree traversal gets to a conflict entry, it lets the caller know there's a conflict entry, it does not traverse through stage 1-3 during traversal. Caller is expected to use conflict lookup API for that. We also support reading partial index, filtered by pathspec. On v2, it reads full index. = Tree update = At some point we may want to work on trees exclusively. Any operations here must keep flat API data structure in sync. We may want to postpone the sync if it's a lot of work, by doing all the work in flush_index() before caller switches from tree API to flat API again. = Flat API deprecation = At some point, tree update API will not update flat API any more unless explicitly asked by caller. I don't expect "cache" in struct index_state to be removed, unless we do really good merges using tree API. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html