Re: [RFC] Speed up "git status" by caching untracked file info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 18, 2014 at 2:40 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:
>
>>            first run  second (cached) run
>> gentoo-x86    500 ms             71.6  ms
>> wine          140 ms              9.72 ms
>> webkit        125 ms              6.88 ms
>> linux-2.6     106 ms             16.2  ms
>>
>> Basically untracked time is cut to one tenth in the best case
>> scenario. The final numbers would be a bit higher because I haven't
>> stored or read the cache from index yet. Real commit message follows..
>
> As you allude to later with "if you recompile a single file, the
> whole hierarchy in that directory is lost", two back-to-back runs of
> "git status" is not very interesting.

No, if you recompile in directory A, then we need to recompute exclude
files for A only, not A/B, A/B/C... We only need to invalidate the
whole hierarchy when A/.gitignore (or worse $GIT_DIR/info/exclude) is
changed.

>> The second one can be hooked from read-cache.c. Whenever a file (or a
>> submodule) is added or removed from a directory, we invalidate that
>> directory. This will be done in a later patch.
>
> I would imagine that it would be done at the same places as we
> invalidate cache-trees, with the same "invalidation percolates up"
> logic.

Yep yep.

>> On subsequent runs, read_directory_recursive() reads stat info of the
>> directory in question and verifies if files/dirs have been added or
>> removed.
>
> Hmph.  If you have a two-level hierarchy D1/D2 and you change the
> list of crufts in D2 but not in D1, the mtime of D1/D2 changes but
> not the mtime of D1, as you observed below.
>
>> With the help of prep_exclude() to verify .gitignore chain,
>> it may decide "all is well" and enable the fast path in
>> treat_path(). read_directory_recursive() is still called for
>> subdirectories even in fast path, because a directory mtime does not
>> cover all subdirs recursively.
>
> I wonder if you can avoid recursing into D1 when no cached mtime
> (and .gitignore) information has changed in any subdirectory of it
> (e.g. both D1 and D1/D2 match the cache).

The problem if when we need to decide to recurse into D1, we have no
idea if any of its subdirs is changed. So we need to recurse in anyway
(at least in the cache; if D1 is unchanged, we will not try to
opendir() it, just get the exclude list from the cache and move on).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]