Nguyán ThÃi Ngác Duy wrote: > The function will reconstruct directory structure from index and feed > excluded_from_list() with proper dtype. Okay, so I guess what this means is: Matching index entries against an excludes file currently has two problems. First, there's no function to do it. Code paths (like sparse checkout) that wanted to try it would iterate over index entries and for each index entry pass that path to excluded_from_list(). But that is not how excluded_from_list() works; one is supposed to feed in each ancester of a path before a given path to find out if it was excluded because of some parent or grandparent matching a bigsubdirectory/ pattern despite the path not matching any .gitignore pattern directly. Second, it's inefficient. The excludes mechanism is supposed to let us block off vast swaths of the filesystem as uninteresting; separately checking every index entry doesn't fit that model. Introduce a new function to take care of both these problems. This traverses the index in depth-first order (well, that's what order the index is in) to mark un-excluded entries. Maybe some day the in-core index format will be restructured to make this sort of operation easier. Or maybe we will want to try some binary search based thing. The interface is simple enough to allow all those things. Example: clear_ce_flags(the_index.cache, the_index.cache_nr, CE_CANDIDATE, CE_CLEARME, exclude_list); would clear the CE_CLEARME flag on all index entries with CE_CANDIDATE flag and not matched by exclude_list. > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -834,6 +834,94 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str > return mask; > } > > +/* > + * traverse the index, find every entry that matches according to > + * o->el. Do "ce_flags &= ~clear_mask" on those entries. Return the > + * number of traversed entries. > + * > + * If select_mask is non-zero, only entries whose ce_flags has on of > + * those bits enabled are traversed. > + */ > +static int clear_ce_flags_1(struct cache_entry **cache, int nr, > + int prefix_len, int match_all, > + int select_mask, int clear_mask, > + struct unpack_trees_options *o) o is only used for its excludes list. Why not cut out the middle man and take o->el as argument? State (struct-worthy?): cache pointer to an index entry prefix_len an offset into its path match_all a boolean telling whether we have some skipping to do The current path (called the prefix), including trailing '/', is cache[0]->name[0] .. cache[0]->name[prefix_len - 1] > +{ > + const char *name, *slash; > + struct cache_entry *ce = cache[0]; > + const char *prefix = ce->name; > + int processed, original_nr = nr; > + > + if (prefix_len && !match_all) { Top level gets special handling. Let's skip to it (see [*] for where we were). > + while (nr) { For each cache entry... Might be nice to phrase this as a for loop. const char *prefix = cache[0]->name; struct cache_entry ** const cache_end = cache + nr; struct cache_entry **cep; for (cep = cache; cep != cache_end; ) { struct cache_entry *ce = *cep; > + if (select_mask && !(ce->ce_flags & select_mask)) { > + cache++; > + ce = *cache; > + nr--; > + continue; > + } Some index entries are disqualified. if (select_mask) if (!(ce->ce_cflags & select_mask)) { cep++; continue; } > + if (prefix_len && strncmp(ce->name, prefix, prefix_len)) > + break; Entries not under prefix are handled by the caller. if (prefix_len) if (strncmp(ce->name, prefix, prefix_len)) return cep - cache; /* number processed */ > + name = ce->name + prefix_len; > + slash = strchr(name, '/'); > + > + /* no slash, this is a file */ > + if (!slash) { Leaf! /* File or directory? Check if it's a submodule. */ > + int dtype = ce_to_dtype(ce); > + if (match_all || > + excluded_from_list(ce->name, ce_namelen(ce), name, &dtype, o->el) > 0) > + ce->ce_flags &= ~clear_mask; > + cache++; > + ce = *cache; > + nr--; > + continue; > + } if (excluded_block || excluded_from_list(... /* Excluded? */ ce->ce_flags &= ~clear_mask; cep++; continue; } Putting the loop body in its own function could allow some nice depth reduction. > + > + /* has slash, this is a directory */ > + processed = clear_ce_flags_1(cache, nr, > + slash + 1 - ce->name, match_all, > + select_mask, clear_mask, o); > + cache += processed; > + ce = *cache; > + nr -= processed; > + } > + return original_nr - nr; /* Non-leaf. Recurse. */ cep += clear_ce_ ... } How about the other case, mentioned before (see [*] above)? assert(prefix_len); /* Not toplevel but a subdirectory. */ assert(!match_all); /* Not wholesale excluded. */ > + int dtype = DT_DIR; > + static char path[PATH_MAX]; > + int pathlen = prefix_len - 1; > + const char *basename; > + > + memcpy(path, prefix, pathlen); > + path[pathlen] = '\0'; > + basename = strrchr(path, '/'); Use memrchr? > + basename = basename ? basename+1 : path; > + switch (excluded_from_list(path, pathlen, basename, &dtype, o->el)) { > + case 0: > + while (nr && !strncmp(ce->name, prefix, prefix_len)) { > + cache++; > + ce = *cache; > + nr--; > + } > + return original_nr - nr; case 0: /* Included. Great, no flag to clear. */ for (; cep != cache_end; cep++) if (strncmp((*cep)->name, prefix, prefix_len)) break; return cep - cache; > + case 1: > + match_all = 1; > + break; case 1: /* Excluded. Great, clear them wholesale. */ match_all = 1; continue; > + default: /* case -1, undecided */ > + break; > + } case -1: /* Undecided. continue; } > + } > + return original_nr - nr; > +} } /* Ran off the end. */ return cep - cache; } That last bit could even longjmp() if expecting deeply nested directory hierarchies. (Sorry, joking, you should not do such an awful thing unless timing it proves it useful.) That was clean, thanks. Hope some of my musings are useful anyway. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html