On 11.02.13 03:56, Duy Nguyen wrote: > On Mon, Feb 11, 2013 at 3:16 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> The other "lstat()" experiment was a very interesting one, but this >> is not yet an interesting experiment to see where in the "ignore" >> codepath we are spending times. >> >> We know that we can tell wt_status_collect_untracked() not to bother >> with the untracked or ignored files with !s->show_untracked_files >> already, but I think the more interesting question is if we can show >> the untracked files with less overhead. >> >> If we want to show untrackedd files, it is a given that we need to >> read directories to see what paths there are on the filesystem. Is >> the opendir/readdir cost dominating in the process? Are we spending >> a lot of time sifting the result of opendir/readdir via the ignore >> mechanism? Is reading the "ignore" files costing us much to prime >> the ignore mechanism? >> >> If readdir cost is dominant, then that makes "cache gitignore" a >> nonsense proposition, I think. If you really want to "cache" >> something, you need to have somebody (i.e. a daemon) who constantly >> keeps an eye on the filesystem changes and can respond with the up >> to date result directly to fill_directory(). I somehow doubt that >> it is a direction we would want to go in, though. > > Yeah, it did not cut out syscall cost, I also cut a lot of user-space > processing (plus .gitignore content access). From the timings I posted > earlier, > >> unmodified dir.c >> real 0m0.550s 0m0.287s >> user 0m0.305s 0m0.201s >> sys 0m0.240s 0m0.084s > > sys time is reduced from 0.24s to 0.08s, so readdir+opendir definitely > has something to do with it (and perhaps reading .gitignore). But it > also reduces user time from 0.305 to 0.201s. I don't think avoiding > readdir+openddir will bring us this gain. It's probably the cost of > matching .gitignore. I'll try to replace opendir+readdir with a > no-syscall version. At this point "untracked caching" sounds more > feasible (and less complex) than ".gitignore cachine". > Thanks for Duy for the measurements, and patches. I took the freedom to convert the patched dir.c into a "runtime configurable" git status option. I'm not sure if the following copy-and-paste work applies, (it is based on Git 1.8.1.3), but the time spend for "git status --changed-only" is basically half the time of "git status", similar to what Duy has measured. I did a test both on a Linux box and Mac OS. And the speedup is so impressive, that I am tempted to submit a patch simlar to the following, what do we think about it? /Torsten -- >8 -- [PATCH] git status: add option changed-only git status may be run faster if - we only check if files are changed which are already known to git. - we don't check if there are untracked files. "git status --changed-only" (or the short form "git status -c") will only check for changed files which are already known to git, and which are in the index. The call to read_directory_recursive() is skipped and untracked files in the working tree are not reported. Inspired-by: Duy Nguyen <pclouds@xxxxxxxxx> Signed-off-by: Torsten Bögershausen <tboegi@xxxxxx> --- builtin/commit.c | 2 ++ dir.c | 5 +++-- dir.h | 3 ++- wt-status.c | 3 +++ wt-status.h | 1 + 5 files changed, 11 insertions(+), 3 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index d6dd3df..6a5ba11 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1158,6 +1158,8 @@ int cmd_status(int argc, const char **argv, const char *prefix) unsigned char sha1[20]; static struct option builtin_status_options[] = { OPT__VERBOSE(&verbose, N_("be verbose")), + OPT_BOOLEAN('c', "changed-only", &s.check_changed_only, + N_("Ignore untracked files. Check if files known to git are modified")), OPT_SET_INT('s', "short", &status_format, N_("show status concisely"), STATUS_FORMAT_SHORT), OPT_BOOLEAN('b', "branch", &s.show_branch, diff --git a/dir.c b/dir.c index a473ca2..555b652 100644 --- a/dir.c +++ b/dir.c @@ -1274,8 +1274,9 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char return dir->nr; simplify = create_simplify(pathspec); - if (!len || treat_leading_path(dir, path, len, simplify)) - read_directory_recursive(dir, path, len, 0, simplify); + if ((!(dir->flags & DIR_CHECK_CHANGED_ONLY)) && + (!len || treat_leading_path(dir, path, len, simplify))) o + read_directory_recursive(dir, path, len, 0, simplify); free_simplify(simplify); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name); diff --git a/dir.h b/dir.h index f5c89e3..1a915a7 100644 --- a/dir.h +++ b/dir.h @@ -41,7 +41,8 @@ struct dir_struct { DIR_SHOW_OTHER_DIRECTORIES = 1<<1, DIR_HIDE_EMPTY_DIRECTORIES = 1<<2, DIR_NO_GITLINKS = 1<<3, - DIR_COLLECT_IGNORED = 1<<4 + DIR_COLLECT_IGNORED = 1<<4, + DIR_CHECK_CHANGED_ONLY = 1<<5 } flags; struct dir_entry **entries; struct dir_entry **ignored; diff --git a/wt-status.c b/wt-status.c index d7cfe8f..b315785 100644 --- a/wt-status.c +++ b/wt-status.c @@ -503,6 +503,9 @@ static void wt_status_collect_untracked(struct wt_status *s) if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES) dir.flags |= DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES; + if (s->check_changed_only) + dir.flags |= DIR_CHECK_CHANGED_ONLY; + setup_standard_excludes(&dir); fill_directory(&dir, s->pathspec); diff --git a/wt-status.h b/wt-status.h index 236b41f..7eb0115 100644 --- a/wt-status.h +++ b/wt-status.h @@ -47,6 +47,7 @@ struct wt_status { const char **pathspec; int verbose; int amend; + int check_changed_only; enum commit_whence whence; int nowarn; int use_color; -- 1.8.2.rc2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html