On Tue, Apr 7, 2015 at 12:10 AM, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote: > On Mon, Apr 6, 2015 at 7:48 AM, Erik Elfström <erik.elfstrom@xxxxxxxxx> wrote: >> Before this change, clean used resolve_gitlink_ref to check for the >> presence of nested git repositories. This had the drawback of creating >> a ref_cache entry for every directory that should potentially be >> cleaned. The linear search through the ref_cache list caused a massive >> performance hit for large number of directories. >> >> Teach clean.c:remove_dirs to use setup.c:is_git_directory >> instead. is_git_directory will actually open HEAD and parse the HEAD >> ref but this implies a nested git repository and should be rare when >> cleaning. >> >> Using is_git_directory should give a more standardized check for what >> is and what isn't a git repository but also gives a slight behavioral >> change. We will now detect and respect bare and empty nested git >> repositories (only init run). Update t7300 to reflect this. >> >> The time to clean an untracked directory containing 100000 sub >> directories went from 61s to 1.7s after this change. > > Impressive. > >> Signed-off-by: Erik Elfström <erik.elfstrom@xxxxxxxxx> >> Helped-by: Jeff King <peff@xxxxxxxx> > > It is customary for your sign-off to be last. > > More below... > >> --- >> diff --git a/builtin/clean.c b/builtin/clean.c >> index 98c103f..e951bd9 100644 >> --- a/builtin/clean.c >> +++ b/builtin/clean.c >> @@ -148,6 +147,24 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset) >> return 0; >> } >> >> +static int is_git_repository(struct strbuf *path) >> +{ >> + int ret = 0; >> + if (is_git_directory(path->buf)) >> + ret = 1; >> + else { >> + int orig_path_len = path->len; >> + if (path->buf[orig_path_len - 1] != '/') > > Minor: I don't know how others feel about it, but I always find it a > bit disturbing to see a potential negative array access without a > safety check that orig_path_len is not 0, either directly in the > conditional or as a documenting assert(). > I think I would prefer to accept empty input and return false rather than assert. What to you think about: static int is_git_repository(struct strbuf *path) { int ret = 0; size_t orig_path_len = path->len; if (orig_path_len == 0) ret = 0; else if (is_git_directory(path->buf)) ret = 1; else { if (path->buf[orig_path_len - 1] != '/') strbuf_addch(path, '/'); strbuf_addstr(path, ".git"); if (is_git_directory(path->buf)) ret = 1; strbuf_setlen(path, orig_path_len); } return ret; } Also I borrowed this pattern from remove_dirs and it has the same problem. Should I add something like this as a separate commit? diff --git a/builtin/clean.c b/builtin/clean.c index ccffd8a..88850e3 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -173,7 +173,8 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, DIR *dir; struct strbuf quoted = STRBUF_INIT; struct dirent *e; - int res = 0, ret = 0, gone = 1, original_len = path->len, len; + int res = 0, ret = 0, gone = 1; + size_t original_len = path->len, len; struct string_list dels = STRING_LIST_INIT_DUP; *dir_gone = 1; @@ -201,6 +202,7 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, return res; } + assert(original_len > 0 && "expects non-empty path"); if (path->buf[original_len - 1] != '/') strbuf_addch(path, '/'); >> + strbuf_addch(path, '/'); >> + strbuf_addstr(path, ".git"); >> + if (is_git_directory(path->buf)) >> + ret = 1; >> + strbuf_setlen(path, orig_path_len); >> + } >> + >> + return ret; >> +} >> + >> static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, >> int dry_run, int quiet, int *dir_gone) >> { -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html