Re: "git add -i" with path gives "Argument list too long"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> their behavior. Junio could probably say more, or you will have to read
> the code.

Or read what I already said here a few times ;-) I generally do not want
to repeat myself.

There are two semantics of pathspecs:

  (1) exact match, or leading path.  e.g.

  	git ls-files Makefile Documentation/

  (2) exact match, leading path, or fnmatch(3).

	git ls-files Makefile Documentation/ '*.txt'

The former is used by the diff family, and the latter by pretty much
everything else.  In very old days, the former was the only kind we
supported, and originally, ls-files didn't even take any pathspecs.

The 5be4efb ([PATCH] Make "git-ls-files" work in subdirectories,
2005-08-21) taught ls-files to take pathspec that can glob, but diff
family never got updated to match that.

In order to operate on set of paths, you would need to (a) enumerate your
paths, and (b) filter that enumeration efficiently with pathspecs.  If you
are iterating over the index (e.g. "ls-files", "diff-files", "grep"),
there is nothing tricky in the enumeration step.  We have a flat array of
names in the index and you just walk active_cache[] from 0 to active_nr.
If on the other hand you are walking in an inherently hierarchical
namespace (e.g. "ls-tree", "diff-tree", "grep --no-index") with non empty
set of pathspecs, you need to take advantage of the filtering behaviour
while enumerating the paths---otherwise your performance will suck.

"Leading path" semantics is easier to understand; if a tree entry you are
looking at is "contrib", and the pathspecs you have are "Makefile" and
"Documentation/", then there is no way for anything underneath it
(e.g. "contrib/README") would survive the filtering process, so you can
skip the entry without even opening the sub-tree object.  Linus's argument
was "teaching globs to pathspec code would suck in performance" and he is
right in general.  Because diff-tree is inherently about walking the two
tree objects in parallel, it does not extend its pathspec semantics to
globbing (i.e. if the user asked for '*.txt', you have to open _all_ the
tree objects down to the leaf level to see if they contain any file whose
name ends with .txt), and other family members of diff (namely, diff-files
and "diff" without --cached nor any tree-ish argument) match this
behaviour for consistency, even though theoretically "diff-files" could
easily do globbing, as it walks the flat index namespace.

But I think it is Ok to sacrifice the optimization and descend into any
and all subtrees/directories to see if a path that might match the pattern
exists when the user asks for '*.txt', as long as (and this is a _very_
important point) an update to pathspec logic on the diff side does not
break the optimization unnecessarily.  E.g.

	git diff v1.0.0 v1.2.0 -- Makefile 'Documentation/*.txt'

should still skip opening tree object for 'contrib/' (because anything
underneath contrib/ would never match either pathspecs given), but can
and should descend into Documentation.  And it should _not_ skip 'howto'
subdirectory in Documentation/ directory, as it could find a match with
'*.txt' in that subdirectory.

To prepare for this, later reimplementations of pathspec matching logic
(the one used by "git grep") can compute hints meant to be used by the
path enumeration step, as I explained earlier, enumeration needs to take
advantage of what filtering would do to paths that it will find.

By the way I threw this "pathspec unification" to the list of possible
GSoC ideas, but I suspect it might be a bit too much work to do this
properly for a summer student (and also we might not want to trust this
important part of the system to a summer student).  Other things that
probably needs to be thought through (and I haven't) that may be related
to this codepath is how to handle case-insensitive filesystems.  I think
we currently do not match paths that we obtain from the filesystem case
insensitively with the given pathspecs (we probably shouldn't go case
insensitive when we are walking the index or the tree objects, on the
other hand).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]