Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > On Sun, May 27, 2012 at 4:27 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Thomas Gummerer <t.gummerer@xxxxxxxxx> writes: >> >>> Ah ok, thanks for the clarification, I understand what you meant now. >>> I think however, that it's not very beneficial to do this conversion >>> now. git ls-files needs the whole index file anyway, so it's probably >>> not a very good test. >> >> Think about "git ls-files t/" and "git ls-files -u". > > Or harder things like "ls-files -- 't/*.sh'" > >> The former obviously does *not* have to look at the whole thing, even >> though the current code assumes the in-core data structure that has the >> whole thing in a flat array. IIRC, you had unmerged entries tucked at the >> end outside the main index data, so the latter is also an interesting >> demonstration of how wonderful the new data format could be. > > and "ls-files -uc" can show how you combine unmerged entries back. > There's also entry existence check deep in "ls-files -o" that you can > show how good bsearch on trees is, though that might be going too far > for an experiment because the call chain is really deep, way outside > ls-files.c: >a > show_files (builtin/ls-files.c) > fill_directory (dir.c) > read_directory > read_directory_recursive > treat_path > treat_one_path > treat_directory > directory_exists_in_index > cache_pos_name (read-cache.c) > > I just want to make sure that by exercising the new format with some > real problems, we are certain we don't overlook anything in designing > the format (or else could be fixed before finalizing it). I envision an index API that more strictly controls access to the index. Right now the API consists largely of read_index, write_index and the flat the_index->cache array of entries. Eventually it will have to be a family of calls that support the v5 format, and boil down to suitable wrappers for older ones. For example (just tossing up ideas): index_open(struct index_state *index, int fd): initialization, checking, leaves the "real" data fields empty index_load_filtered(..., const char **pathspec): load everything needed to satisfy queries filtered by 'pathspec' index_for_each_entry(..., void (*callback)(struct cache_entry *ent)): like the current hand-rolled looping index_for_each_entry_filtered(..., void (*callback)(struct cache_entry *ent), char **pathspec): ditto but for a pathspec lookup etc. Then I will twist Duy's words to mean that you should make git-ls-files the poster child of this new API for development and profiling purposes :-) Actually converting the rest of the git code base to such an API is too big an undertaking for the summer, so please don't stray on that path. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html