Re: inotify to minimize stat() calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 10, 2013 at 12:17 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
> On Sun, Feb 10, 2013 at 12:24:58PM +0700, Duy Nguyen wrote:
>> On Sun, Feb 10, 2013 at 12:10 AM, Ramkumar Ramachandra
>> <artagnon@xxxxxxxxx> wrote:
>> > Finn notes in the commit message that it offers no speedup, because
>> > .gitignore files in every directory still have to be read.  I think
>> > this is silly: we really should be caching .gitignore, and touching it
>> > only when lstat() reports that the file has changed.
>> >
>> > ...
>> >
>> > Really, the elephant in the room right now seems to be .gitignore.
>> > Until that is fixed, there is really no use of writing this inotify
>> > daemon, no?  Can someone enlighten me on how exactly .gitignore files
>> > are processed?
>>
>> .gitignore is a different issue. I think it's mainly used with
>> read_directory/fill_directory to collect ignored files (or not-ignored
>> files). And it's not always used (well, status and add does, but diff
>> should not). I think wee need to measure how much mass lstat
>> elimination gains us (especially on big repos) and how much
>> .gitignore/.gitattributes caching does.
>
> OK let's count. I start with a "standard" repository, linux-2.6. This
> is the number from strace -T on "git status" (*). The first column is
> accumulated time, the second the number of syscalls.
>
> top syscalls sorted     top syscalls sorted
> by acc. time            by number
> ----------------------------------------------
> 0.401906 40950 lstat    0.401906 40950 lstat
> 0.190484 5343 getdents  0.150055 5374 open
> 0.150055 5374 open      0.190484 5343 getdents
> 0.074843 2806 close     0.074843 2806 close
> 0.003216 157 read       0.003216 157 read
>
> The following patch pretends every entry is uptodate without
> lstat. With the patch, we can see refresh code is the cause of mass
> lstat, as lstat disappears:
>
> 0.185347 5343 getdents  0.144173 5374 open
> 0.144173 5374 open      0.185347 5343 getdents
> 0.071844 2806 close     0.071844 2806 close
> 0.004918 135 brk        0.003378 157 read
> 0.003378 157 read       0.004918 135 brk
>
> -- 8< --
> diff --git a/read-cache.c b/read-cache.c
> index 827ae55..94d8ed8 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1018,6 +1018,10 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
>         if (ce_uptodate(ce))
>                 return ce;
>
> +#if 1
> +       ce_mark_uptodate(ce);
> +       return ce;
> +#endif
>         /*
>          * CE_VALID or CE_SKIP_WORKTREE means the user promised us
>          * that the change to the work tree does not matter and told
> -- 8< --
>
> The following patch eliminates untracked search code. As we can see,
> open+getdents also disappears with this patch:
>
> 0.462909 40950 lstat   0.462909 40950 lstat
> 0.003417 129 brk       0.003417 129 brk
> 0.000762 53 read       0.000762 53 read
> 0.000720 36 open       0.000720 36 open
> 0.000544 12 munmap     0.000454 33 close
>
> So from syscalls point of view, we know what code issues most of
> them. Let's see how much time we gain be these patches, which is an
> approximate of the gain by inotify support. This time I measure on
> gentoo-x86.git [1] because this one has really big worktree (100k
> files)
>
>         unmodified  read-cache.c  dir.c     both
> real    0m0.550s    0m0.479s      0m0.287s  0m0.213s
> user    0m0.305s    0m0.315s      0m0.201s  0m0.182s
> sys     0m0.240s    0m0.157s      0m0.084s  0m0.030s
>
> and the syscall picture on gentoo-x86.git:
>
> 1.106615 101942 lstat    1.106615 101942 lstat
> 0.667235 47083 getdents  0.641604 47114 open
> 0.641604 47114 open      0.667235 47083 getdents
> 0.286711 23573 close     0.286711 23573 close
> 0.005842 350 brk         0.005842 350 brk
>
> We can see that shortcuting untracked code gives bigger gain than
> index refresh code. So I have to agree that .gitignore may be the big
> elephant in this particular case.
>
> Bear in mind though this is Linux, where lstat is fast. On systems
> with slow lstat, these timings could look very different due to the
> large number of lstat calls compared to open+getdents. I really like
> to see similar numbers on Windows.

Karsten Blees has done something similar-ish on Windows, and he posted
the results here:

https://groups.google.com/forum/#!topic/msysgit/fL_jykUmUNE/discussion

I also seem to remember he doing a ReadDirectoryChangesW version, but
I don't remember what happened with that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]