On Fri, Nov 13, 2015 at 04:29:15PM +0100, Andreas Krey wrote: > > Likewise, I think dir.c:remove_dir_recurse is in a similar boat. > > Grepping for resolve_gitlink_ref, it looks like there may be others, > > too. > > Can't we handle this in resolve_gitlink_ref itself? As I understand it, > it should resolve a ref (here "HEAD") when path points to a submodule. > When there isn't one it should return -1, so: I'm not sure. I think part of the change to git-clean was that is_git_directory() is a _better_ check than "can we resolve HEAD?" because it covers empty repos, too. > diff --git a/refs.c b/refs.c > index 132eff5..f8648c5 100644 > --- a/refs.c > +++ b/refs.c > @@ -1553,6 +1553,10 @@ int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *sh > if (!len) > return -1; > submodule = xstrndup(path, len); > + if (!is_git_directory(submodule)) { > + free(submodule); > + return -1; > + } > refs = get_ref_cache(submodule); > free(submodule); > > I'm way too little into the code to see what may this may get wrong. I don't think it produces wrong outcomes, but I think it's sub-optimal. In cases where we already have a ref cache, we'll hit the filesystem for each lookup to re-confirm what we already know. That doesn't affect your case, but it does when we actually _do_ have a submodule. So if we were to follow this route, I think it would go better in get_ref_cache itself (right after we determine there is no existing cache, but before we call create_ref_cache()). > But this, as well as the old hash-ref-cache patch speeds me > up considerably, in this case a git ls-files -o from half a > minute of mostly user CPU to a second. Right, that makes sense to me. > > All of these should be using the same test, I think. Doing that with > > is_git_directory() is probably OK. It is a little more expensive than we > > might want for mass-use (it actually opens and parses the HEAD file in > > each directory), > > This happens as well when we let resolve_gitlink_ref run its old course. > (It (ls-files) even seems to try to open .git and then .git/HEAD, even > if the former fails with ENOENT.) Yes, I think my earlier comment that you are quoting was just misguided. We only do the extra work if the directory actually does look like a gitdir, and the many-directories case we are optimizing here is the opposite of that. So summing up, I think: 1. We could get by with teaching get_ref_cache not to auto-create ref caches for non-git-directories. 2. But for a little more work, pushing the is_git_directory() check out to the call-sites gives us probably saner semantics overall. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html