Re: [BUG] gitignore documentation inconsistent with actual behaviour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 11, 2018 at 05:19:06AM -0500, dana wrote:
> Hello,
> 
> I'm a contributor to ripgrep, which is a grep-like tool that supports using
> gitignore files to control which files are searched in a repo (or any other
> directory tree). ripgrep's support for the patterns in these files is based on
> git's official documentation, as seen here:
> 
>   https://git-scm.com/docs/gitignore
> 
> One of the most common reports on the ripgrep bug tracker is that it does not
> allow patterns like the following real-world examples, where a ** is used along
> with other text within the same path component:
> 
>   **/**$$*.java
>   **.orig
>   **local.properties
>   !**.sha1
> 
> The reason it doesn't allow them is that the gitignore documentation explicitly
> states that they're invalid:
>
> ...

I've checked the code and run some tests. There is a twist here. "**"
is only special when matched in "pathname" mode. That is when the
pattern contains at least one slash. In your patterns above, that only
applies to the first pattern.

When '**' is special, if it's neither '**/', '/**/' or '/**', it _is_
considered invalid (i.e. bad pattern) and the pattern will not match
anything.

The confusion comes from when '**' is not special for the remaining
three patterns, it's considered as regular '*' and still matches
stuff.

So, I think we have two options. The document could be clarified with
something like this

-- 8< --
diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index d107daaffd..500cd43939 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -100,7 +100,8 @@ PATTERN FORMAT
    a shell glob pattern and checks for a match against the
    pathname relative to the location of the `.gitignore` file
    (relative to the toplevel of the work tree if not from a
-   `.gitignore` file).
+   `.gitignore` file). Note that the "two consecutive asterisks" rule
+   below does not apply.
 
  - Otherwise, Git treats the pattern as a shell glob: "`*`" matches
    anything except "`/`", "`?`" matches any one character except "`/`"
@@ -129,7 +130,8 @@ full pathname may have special meaning:
    matches zero or more directories. For example, "`a/**/b`"
    matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
 
- - Other consecutive asterisks are considered invalid.
+ - Other consecutive asterisks are considered invalid and the pattern
+   is ignored.
 
 NOTES
 -----
-- 8< --

Or we could make the behavior consistent. If '**' is invalid, just
consider it two separate regular '*'. Then all four of your patterns
will behave the same way. The change for that is quite simple

-- 8< --
diff --git a/wildmatch.c b/wildmatch.c
index d074c1be10..64087bf02c 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -104,8 +104,10 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 					    dowild(p + 1, text, flags) == WM_MATCH)
 						return WM_MATCH;
 					match_slash = 1;
-				} else
-					return WM_ABORT_MALFORMED;
+				} else {
+					/* without WM_PATHNAME, '*' == '**' */
+					match_slash = flags & WM_PATHNAME ? 0 : 1;
+				}
 			} else
 				/* without WM_PATHNAME, '*' == '**' */
 				match_slash = flags & WM_PATHNAME ? 0 : 1;
-- 8< --

Which way should we go? I'm leaning towards the second one...
--
Duy



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux