Jean-Luc Herren venit, vidit, dixit 21.11.2008 13:46: > Glenn Griffin wrote: >> On Thu, Nov 20, 2008 at 4:28 PM, Jean-Luc Herren <jlh@xxxxxx> wrote: >>> The first 'git status' shows the same difference as the second, >>> just the second time it's staged instead of unstaged. Why does it >>> take 16 seconds the second time when it's instant the first time? >> I believe the two runs of git status need to do very different things. >> When run the first time, git knows the files in your working >> directory are not in the index so it can easily say those files are >> 'Changed but not updated' just from their existence. > > I might be mistaken about how the index works, but those paths > *are* in the index at that time. They just have the old content, > i.e. the same content as in HEAD. When HEAD == index, then > nothing is staged. > > But the presence of those files alone doesn't tell you that they > have changed. You have to look at the content and compare it to > the index (== HEAD in this situation) to see whether they have > changed or not and for some reason git can do this very quickly. > >> The second run >> those files do exist in both the index and the working directory, so >> git status first shows the files that are 'Changes to be committed' >> and that should be fast, but additionally git status will check to see >> if those files in your working directory have changed since you added >> them to the index. > > Which is basically the same comparision as above, just it turns > out that they have not changed. But even then, we're talking > about comparing a 1 byte file in the index to a 1 byte file in the > work tree. That doesn't take 16 seconds, even for 100 files. > > So this makes me believe it's the first step (comparing HEAD to > the index to show staged changes) that is slow. And when you > compare a 1MB file to a 1 byte file, you don't need to read all of > the big file, you can tell they're not the same right after the > first byte. (Even an doing stat() is enough, since the size is > not the same.) > > Another thing that came to my mind is maybe rename detection kicks > in, even though no path vanished and none is new. I believe > rename detection doesn't happen for unstaged changes, which might > explain the difference in speed. > > btw, I forgot to mention that I get this with branches maint, > master, next and pu. Interestingly, all of git diff --stat git diff --stat --cached git diff --stat HEAD are "fast" (0.2s or so), i.e. diffing index-wtree, HEAD-index, HEAD-wtree. Linus' threaded stat doesn't help either for status, btw (20s). Experimenting further: Using 10 files with 10MB each (rather than 100 times 1MB) brings down the time by a factor 10 roughly - and so does using 100 files with 100k each. Huh? Latter may be expected (10MB total), but former (100MB total)? Now it's getting funny: Changing your "echo >" to "echo ">>" (in your 100 files 1MB case) makes things "almost fast" again (1.3s). OK, it's "use the source, Luke" time... Actually the part you don't see takes the most time: wt_status_print_updated() And in fact I can confirm your suspicion: wt_status_print_updated() enforces rename detection (ignoring any config). Forcing it off (rev.diffopt.detect_rename = 0;) cuts down the 20s to 0.75s. How about a config option status.renames (or something like -M) for status? Michael -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html