On Thu, Oct 11 2018, dana wrote: > Hello, > > I'm a contributor to ripgrep, which is a grep-like tool that supports using > gitignore files to control which files are searched in a repo (or any other > directory tree). ripgrep's support for the patterns in these files is based on > git's official documentation, as seen here: > > https://git-scm.com/docs/gitignore > > One of the most common reports on the ripgrep bug tracker is that it does not > allow patterns like the following real-world examples, where a ** is used along > with other text within the same path component: > > **/**$$*.java > **.orig > **local.properties > !**.sha1 > > The reason it doesn't allow them is that the gitignore documentation explicitly > states that they're invalid: > > A leading "**" followed by a slash means match in all directories... > > A trailing "/**" matches everything inside... > > A slash followed by two consecutive asterisks then a slash matches zero or > more directories... > > Other consecutive asterisks are considered invalid. > > git itself happily accepts these patterns, however, apparently treating the ** > like a single * without fnmatch(3)'s FNM_PATHNAME flag set (in other words, it > matches / as a regular character, thus crossing directory boundaries). > > ripgrep's developer is loathe to reverse-engineer this undocumented behaviour, > and so the reports keep coming, both for ripgrep itself and for down-stream > consumers of it and its ignore crate (including most notably Microsoft's VS Code > editor). > > My request: Assuming that git's actual handling of these patterns is intended, > would it be possible to make it 'official' and explicitly add it to the > documentation? > > References (the first one is the main bug): > > https://github.com/BurntSushi/ripgrep/issues/373 > https://github.com/BurntSushi/ripgrep/issues/507 > https://github.com/BurntSushi/ripgrep/issues/859 > https://github.com/BurntSushi/ripgrep/issues/945 > https://github.com/BurntSushi/ripgrep/issues/1080 > https://github.com/BurntSushi/ripgrep/issues/1082 > https://github.com/Microsoft/vscode/issues/24050 Yeah those docs seem wrong. In general the docs for the matching function are quite bad. I have on my TODO list to factor this out into some gitwildmatch manpage, but right now the bit in gitignore is all we have. Our matching function comes from rsync originally, whose manpage says: use ’**’ to match anything, including slashes. I believe this is accurate as far as the implementation goes. You can also see the rather exhaustive tests here: https://github.com/git/git/blob/master/t/t3070-wildmatch.sh Note the different behavior with e.g. --glob-pathspecs v.s. the default. There's also stuff like: $ grep diff=perl .gitattributes *.perl eol=lf diff=perl *.pl eof=lf diff=perl *.pm eol=lf diff=perl $ git ls-files ":(attr:diff=perl)" | wc -l 65 And then the exclude syntax. This is not in .gitignore: $ git ls-files ":(exclude)*.pm" ":(attr:diff=perl)" | wc -l 41 $ git ls-files ":^*.pm" ":(attr:diff=perl)" | wc -l 41 I.e. we have wildmatch() on one hand and then the pathspec matching. For an unrelated thing I have been thinking of adding a new plumbing command who'd get input like this on stdin: 1 text t/t0202/test.pl\0\0 2 text perl/Git.pm\0\0 3 text *.pm\0\0 4 text :^*.pm"\0:(attr:diff=perl)\0\0 5 match glob,icase\04\03\0\0 6 match icase\04\02\0\0 7 match \04\01\0\0 Which would return (in any order): 1 OK 2 OK 3 OK 4 OK 5 NO 6 NO 7 YES Or whatever. I.e. something where you can as a batch feed various strings into a program in a batch, and then ask how some of them would match against each other. The reason for this is to extend something like git-multimail[1] with a config where users can subscribe to changes to paths as declared by git pathspecs, and be informed about which of the glob rules they defined matched their config. Right now you'd need to e.g. for a "git push" run each match rule for each user with such config against each changed path in each commit that's pushed, but plumbing like this would allow for feeding arbitrary combinations of those in and ask what matches against what. The reason I'm on this tangent is to ask whether if such a thing existed, if it's something you can see something like ripgrep using. I.e. ask git given its .gitignore and .gitattributes what thing matches one of its pathspecs instead of carrying your own bug-compatible implementation. 1. https://github.com/git-multimail/git-multimail/