Re: [PATCH] read_directory: avoid invoking exclude machinery on tracked files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 15.02.2013 20:32, schrieb Junio C Hamano:
> Duy Nguyen <pclouds@xxxxxxxxx> writes:
> 
>> On Fri, Feb 15, 2013 at 11:52 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>> In the current code, we always check if a path is excluded, and when
>>> dealing with DT_REG/DT_LNK, we call treat_file():
>>>
>>>  * When such a path is excluded, treat_file() returns true when we
>>>    are not showing ignored directories. This causes treat_one_path()
>>>    to return path_ignored, so for excluded DT_REG/DT_LNK paths when
>>>    no DIR_*_IGNORED is in effect, this change is a correct
>>>    optimization.
>>>
>>>  * When such a path is not excluded, on the ther hand, and when we
>>>    are not showing ignored directories, treat_file() just returns
>>>    the value of exclude_file, which is initialized to false and is
>>>    not changed in the function.  This causes treat_one_path() to
>>>    return path_handled.  However, the new code returns path_ignored
>>>    in this case.
>>>
>>> What guarantees that this change is regression free?
>>
>> If you consider read_directory_recursive alone, there is a regression.
>> The return value of r_d_r depends on path_handled/path_ignored. With
>> this patch, the return value will be different.
> 
> That is exactly what was missing from the proposed log message, and
> made me ask "Do all the callers that reach this function in their
> callgraph, when they get path_ignored for a path in the index,
> behave as if the difference between path_ignored and path_handled
> does not matter?"  Your answer seems to be
> 
>  - r-d-r returns 'how many paths in this directory match the
>    criteria we are looking for', unless check_only is true.  Now in
>    some cases we return path_ignored not path_handled, so we may
>    return a number that is greater than we used to return.
> 
>  - treat_directory, the only user of that return value, cares if
>    r-d-r returned 0 or non-zero; and
> 
>  - As long as we keep returning 0 from r-d-r in cases we used to
>    return 0 and non-zero in cases we used to return non-zero, exact
>    number does not matter.  Overall we get the same result.
> 
> I think all of the above is true, but I have not convinced myself
> that r-d-r with the new code never returns 0 when we used to return
> non-zero.
> 

treat_directory calls read_directory_recursive in tow cases:

1.) The directory is not in the index.

---8<---
switch (directory_exists_in_index(dirname, len-1)) {
case index_nonexistent:
	if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
		break;
}
...
---8<---


The directory is not in the index if there are no tracked files in the directory. I.e. cache_name_exists will always be false in this case, so the change won't affect the result of r_d_r.


2.) The directory is in the index but is ignored.

---8<---
switch (directory_exists_in_index(dirname, len-1)) {
case index_directory:
	if ((dir->flags & DIR_SHOW_OTHER_DIRECTORIES) && exclude)
		break;
}

if ((dir->flags & DIR_SHOW_IGNORED) && !exclude) {
	...
}
if (!(dir->flags & DIR_SHOW_IGNORED) &&
    !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
	return show_directory;
if (!read_directory_recursive(dir, dirname, len, 1, simplify))
	return ignore_directory;
return show_directory;
---8<---


With exclude==true, only one r_d_r call is reachable, and only if either DIR_SHOW_IGNORED or DIR_HIDE_EMPTY_DIRECTORIES is set.

2a) DIR_SHOW_IGNORED is set: the patch already checks !(dir->flags & DIR_SHOW_IGNORED), so the result of r_d_r is not affected.

2b) DIR_HIDE_EMPTY_DIRECTORIES is set and DIR_SHOW_IGNORED is not set: the directory is already ignored, so all files in the directory should be ignored, too. It doesn't matter whether treat_one_path returns path_ignored because of the excluded() check or cache_name_exists().


Therefore, I think the patch (v0) is regression-free.



As a side note, I'm quite confused why we would ever want to evaluate .gitignore patterns on tracked files at all, as gitignore(5) clearly states "Files already tracked by git are not affected". There is 'git-ls-files --cached --ignored', although this doesn't seem to process .gitignore files but expects exclude patterns on the command line...

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]