On Sun, Feb 10, 2013 at 12:17 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote: > On Sun, Feb 10, 2013 at 12:24:58PM +0700, Duy Nguyen wrote: >> On Sun, Feb 10, 2013 at 12:10 AM, Ramkumar Ramachandra >> <artagnon@xxxxxxxxx> wrote: >> > Finn notes in the commit message that it offers no speedup, because >> > .gitignore files in every directory still have to be read. I think >> > this is silly: we really should be caching .gitignore, and touching it >> > only when lstat() reports that the file has changed. >> > >> > ... >> > >> > Really, the elephant in the room right now seems to be .gitignore. >> > Until that is fixed, there is really no use of writing this inotify >> > daemon, no? Can someone enlighten me on how exactly .gitignore files >> > are processed? >> >> .gitignore is a different issue. I think it's mainly used with >> read_directory/fill_directory to collect ignored files (or not-ignored >> files). And it's not always used (well, status and add does, but diff >> should not). I think wee need to measure how much mass lstat >> elimination gains us (especially on big repos) and how much >> .gitignore/.gitattributes caching does. > > OK let's count. I start with a "standard" repository, linux-2.6. This > is the number from strace -T on "git status" (*). The first column is > accumulated time, the second the number of syscalls. > > top syscalls sorted top syscalls sorted > by acc. time by number > ---------------------------------------------- > 0.401906 40950 lstat 0.401906 40950 lstat > 0.190484 5343 getdents 0.150055 5374 open > 0.150055 5374 open 0.190484 5343 getdents > 0.074843 2806 close 0.074843 2806 close > 0.003216 157 read 0.003216 157 read > > The following patch pretends every entry is uptodate without > lstat. With the patch, we can see refresh code is the cause of mass > lstat, as lstat disappears: > > 0.185347 5343 getdents 0.144173 5374 open > 0.144173 5374 open 0.185347 5343 getdents > 0.071844 2806 close 0.071844 2806 close > 0.004918 135 brk 0.003378 157 read > 0.003378 157 read 0.004918 135 brk > > -- 8< -- > diff --git a/read-cache.c b/read-cache.c > index 827ae55..94d8ed8 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1018,6 +1018,10 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate, > if (ce_uptodate(ce)) > return ce; > > +#if 1 > + ce_mark_uptodate(ce); > + return ce; > +#endif > /* > * CE_VALID or CE_SKIP_WORKTREE means the user promised us > * that the change to the work tree does not matter and told > -- 8< -- > > The following patch eliminates untracked search code. As we can see, > open+getdents also disappears with this patch: > > 0.462909 40950 lstat 0.462909 40950 lstat > 0.003417 129 brk 0.003417 129 brk > 0.000762 53 read 0.000762 53 read > 0.000720 36 open 0.000720 36 open > 0.000544 12 munmap 0.000454 33 close > > So from syscalls point of view, we know what code issues most of > them. Let's see how much time we gain be these patches, which is an > approximate of the gain by inotify support. This time I measure on > gentoo-x86.git [1] because this one has really big worktree (100k > files) > > unmodified read-cache.c dir.c both > real 0m0.550s 0m0.479s 0m0.287s 0m0.213s > user 0m0.305s 0m0.315s 0m0.201s 0m0.182s > sys 0m0.240s 0m0.157s 0m0.084s 0m0.030s > > and the syscall picture on gentoo-x86.git: > > 1.106615 101942 lstat 1.106615 101942 lstat > 0.667235 47083 getdents 0.641604 47114 open > 0.641604 47114 open 0.667235 47083 getdents > 0.286711 23573 close 0.286711 23573 close > 0.005842 350 brk 0.005842 350 brk > > We can see that shortcuting untracked code gives bigger gain than > index refresh code. So I have to agree that .gitignore may be the big > elephant in this particular case. > > Bear in mind though this is Linux, where lstat is fast. On systems > with slow lstat, these timings could look very different due to the > large number of lstat calls compared to open+getdents. I really like > to see similar numbers on Windows. Karsten Blees has done something similar-ish on Windows, and he posted the results here: https://groups.google.com/forum/#!topic/msysgit/fL_jykUmUNE/discussion I also seem to remember he doing a ReadDirectoryChangesW version, but I don't remember what happened with that. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html