Jeff King <peff@xxxxxxxx> writes: > their behavior. Junio could probably say more, or you will have to read > the code. Or read what I already said here a few times ;-) I generally do not want to repeat myself. There are two semantics of pathspecs: (1) exact match, or leading path. e.g. git ls-files Makefile Documentation/ (2) exact match, leading path, or fnmatch(3). git ls-files Makefile Documentation/ '*.txt' The former is used by the diff family, and the latter by pretty much everything else. In very old days, the former was the only kind we supported, and originally, ls-files didn't even take any pathspecs. The 5be4efb ([PATCH] Make "git-ls-files" work in subdirectories, 2005-08-21) taught ls-files to take pathspec that can glob, but diff family never got updated to match that. In order to operate on set of paths, you would need to (a) enumerate your paths, and (b) filter that enumeration efficiently with pathspecs. If you are iterating over the index (e.g. "ls-files", "diff-files", "grep"), there is nothing tricky in the enumeration step. We have a flat array of names in the index and you just walk active_cache[] from 0 to active_nr. If on the other hand you are walking in an inherently hierarchical namespace (e.g. "ls-tree", "diff-tree", "grep --no-index") with non empty set of pathspecs, you need to take advantage of the filtering behaviour while enumerating the paths---otherwise your performance will suck. "Leading path" semantics is easier to understand; if a tree entry you are looking at is "contrib", and the pathspecs you have are "Makefile" and "Documentation/", then there is no way for anything underneath it (e.g. "contrib/README") would survive the filtering process, so you can skip the entry without even opening the sub-tree object. Linus's argument was "teaching globs to pathspec code would suck in performance" and he is right in general. Because diff-tree is inherently about walking the two tree objects in parallel, it does not extend its pathspec semantics to globbing (i.e. if the user asked for '*.txt', you have to open _all_ the tree objects down to the leaf level to see if they contain any file whose name ends with .txt), and other family members of diff (namely, diff-files and "diff" without --cached nor any tree-ish argument) match this behaviour for consistency, even though theoretically "diff-files" could easily do globbing, as it walks the flat index namespace. But I think it is Ok to sacrifice the optimization and descend into any and all subtrees/directories to see if a path that might match the pattern exists when the user asks for '*.txt', as long as (and this is a _very_ important point) an update to pathspec logic on the diff side does not break the optimization unnecessarily. E.g. git diff v1.0.0 v1.2.0 -- Makefile 'Documentation/*.txt' should still skip opening tree object for 'contrib/' (because anything underneath contrib/ would never match either pathspecs given), but can and should descend into Documentation. And it should _not_ skip 'howto' subdirectory in Documentation/ directory, as it could find a match with '*.txt' in that subdirectory. To prepare for this, later reimplementations of pathspec matching logic (the one used by "git grep") can compute hints meant to be used by the path enumeration step, as I explained earlier, enumeration needs to take advantage of what filtering would do to paths that it will find. By the way I threw this "pathspec unification" to the list of possible GSoC ideas, but I suspect it might be a bit too much work to do this properly for a summer student (and also we might not want to trust this important part of the system to a summer student). Other things that probably needs to be thought through (and I haven't) that may be related to this codepath is how to handle case-insensitive filesystems. I think we currently do not match paths that we obtain from the filesystem case insensitively with the given pathspecs (we probably shouldn't go case insensitive when we are walking the index or the tree objects, on the other hand). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html