Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 19 2021, Derrick Stolee wrote:

> On 11/19/2021 3:05 PM, Johannes Sixt wrote:
>> Am 19.11.21 um 15:51 schrieb Derrick Stolee:
>>> What is unclear to me is what exactly "match a directory" means.
>>> If we ignore a directory, then we ignore everything inside it (until
>>> another pattern says we should care about it), but the converse
>>> should also hold: if we have a pattern like "!data/**/", then that
>>> should mean "include everything inside data/<A>/ where <A> is any
>>> directory name".
>>>
>>> My inability to form a mental model where the existing behavior
>>> matches the documented specification is an indicator that this was
>>> changed erroneously. A revert patch is included at the end of this
>>> message.
>>>
>>> If anyone could help clarify my understanding here, then maybe
>>> there is room for improving the documentation.
>> 
>> You form a wrong mental model when you start with the grand picture of a
>> working tree. That is, when you say
>> 
>> - here I have theeeeeese many files and directories,
>> - and I want to ignore some: foo/**/,
>> - but I don't want to ignore others: !bar/**/.
>> 
>> This forms the wrong mental model because that is not how Git sees the
>> working tree: it never has a grand picture of all of its contents.
>> 
>> Git only ever sees the contents of one directory. When Git determines
>> that a sub-directory is ignored, then that one's contents are never
>> inspected, and there is no opportunity to un-ignore some of the
>> sub-directory's contents.
>
> So the problem is this: I want to know "I have a file named <X>, and
> a certain pattern set, does <X> match the patterns or not?" but in
> fact it's not just "check <X> against the patterns in order" but
> actually "check every parent directory of <X> in order to see if
> any directory is unmatched, which would preclude any later matches
> to other parents of <X>"
>
> So really, to check a path, we really want to first iterate on the
> parent directories. If we get a match on a positive pattern on level
> i, then we check level (i+1) for a match on a negative pattern. If
> we find that negative pattern match, then continue. If we do not see
> a negative match, then we terminate by matching the entire path <X>.
>
> I'm still not seeing a clear way of describing the matching procedure
> here for a single path, and that's fine. Me understanding is not a
> necessary condition for fixing this bug.

Just watching this thread on the sidelines I think it would help if it
can be distilled down to a wildatch() test that doesn't have to do with
the pathspec matching code.

I.e. can you stick the "should this match?" into t3070 and it does the
same thing, or is this to do with the pathspec-specific sugar on top,
either that it splits paths and then matches them, that there's some
information about the path type in there added on top, or that it's to
do with the specifics of the exclude/include gitignore matching?

FWIW I have some old WIP patches somewhere where I made this match
behavior much faster by compiling the (using a mode PCREv2 has) glob
syntax into PCRE's, which are then JIT'ed, and matched.

To do that I had to unpeel this whole truncation of the pattern thing,
and IIRC it didn't matter for speed (or maybe it did just with the
wildmatch code?).

Maybe all of this is irrelevant, sorry. I haven't looked into this issue
at all, just skimmed this growing thread over the past day, maybe some
of the above helps, or not...



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux