Re: [PATCH v8] status: modernize git-status "slow untracked files" advice

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 25 Nov 2022 13:58:43 +0900

"Rudy Rigot via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Rudy Rigot <rudy.rigot@xxxxxxxxx>
>
> `git status` can be slow when there are a large number of
> untracked files and directories since Git must search the entire
> worktree to enumerate them.  When it is too slow, Git prints
> advice with the elapsed search time and a suggestion to disable
> the search using the `-uno` option.  This suggestion also carries
> a warning that might scare off some users.
>
> However, these days, `-uno` isn't the only option.  Git can reduce
> the size and time of the untracked file search when the

"time" I can sort of understand ("can reduce the time taken to
enumerate untracked files" is how I may phrase it, though), but
what did you want to say with "size"?

> `core.untrackedCache` and `core.fsmonitor` features are enabled by
> caching results from previous `git status` invocations.
>
> Therefore, update the `git status` man page to explain the various
> configuration options, and update the advice to provide more ...

Lose "Therefore, "; the resulting text would be much easier to
follow.

> +UNTRACKED FILES AND STATUS SPEED

"STATUS SPEED" somehow does not sound quite grammatical.  Perhaps
"untracked files and performance" or something instead?

> +--------------------------------
> +
> +`git status` can be very slow in large worktrees if/when it
> +needs to search for untracked files and directories. There are
> +many configuration options available to speed this up by either
> +avoiding the work or making use of cached results from previous
> +Git commands. There is no single optimum set of settings right
> +for everyone.  Here is a brief summary of the relevant options
> +to help you choose which is right for you.

Good.

> +* First, you may want to run `git status` again. Your current
> +	configuration may already be caching `git status` results,
> +	so it could be faster on subsequent runs.

The above may be a good advice, but it is misleading to make it as
if it is another alternative of equal footing with everything else
listed.  It may likely make the resulting text much easier to follow
if you fold it into "Here is a summary", perhaps like...

    ... right for everyone.  We'll list a summary of the relevant
    options to help you, but before going into the list, you may
    want to run `git status` again, because your configuration may
    already be ...

> +* The `--untracked-files=no` flag or the
> +	`status.showUntrackedfiles=false` config (see above for both) :

Lose the SP before the ":" (applies to all other entries, too).

> +	indicate that `git status` should not report untracked
> +	files. This is the fastest option. `git status` will not list
> +	the untracked files, so you need to be careful to remember if
> +	you create any new files and manually `git add` them.

OK.

> +* `advice.statusUoption=false` (see linkgit:git-config[1]) :
> +	this config option disables a warning message when the search
> +	for untracked files takes longer than desired. In some large
> +	repositories, this message may appear frequently and not be a
> +	helpful signal.

This is not technically wrong per-se, except that "desired" in
"takes longer than desired" may simply be wrong.

The reason why the message may not be a "helpful signal" is in such
a repository and project the user may have already accepted the
current trade-off as _desirable_, iow, the user is WILLING to wait
for 2 seconds.  And in such a case, it indeed is the most sensible
option to disable the advice.

We should also stress the fact that this has nothing to do with
speeding up, unlike other pieces of advice you are giving here.
It's not like disabling the advice will allow us to omit something
we need to do to compute the advice (in other words, if the overhead
to measure the time taken to list untracked files is large, this may
matter, but that is hardly the case).

Perhaps

    Setting this variable to `false` disables the warning message
    given when enumerating untracked files takes more than 2
    seconds.  In a large project, it may take longer and the user
    may have already accepted the trade off (e.g. using "-uno" may
    not be an acceptable option for the user), in which case, there
    is no point issuing the warning message, and in such a case,
    disabling the warning may be the best.

or something like that.

> +test_expect_success 'setup slow status advice' '
> +	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main git init slowstatus &&
> +	(
> +		cd slowstatus &&
> +		cat >.gitignore <<-\EOF &&
> +		/actual
> +		/expected
> +		/out
> +		EOF
> +		git add .gitignore &&
> +		git commit -m "Add .gitignore" &&
> +		git config advice.statusuoption true
> +	)
> +'
> +
> +test_expect_success 'slow status advice when core.untrackedCache and fsmonitor are unset' '
> +	(
> +		cd slowstatus &&
> +		git config core.untrackedCache false &&
> +		git config core.fsmonitor false &&
> +		GIT_TEST_UF_DELAY_WARNING=1 git status >out &&
> +		sed "s/[0-9]\.[0-9][0-9]/X/g" out >actual &&

What if it takes more than 10 seconds, e.g.

	"It took 92.34 seconds to enumerate..."

Wouldn't it be redacted into "It took 9X seconds to enumerate"?

It probably does not happen, only because you are forcing the code
to pretend that it took 2.001 seconds or something, I suspect.  But
if you are forcing with GIT_TEST_UF_DELAY_WARNING to pretend that it
took some unacceptably long time, it may be more robust to

 * pass "struct wt_status *s" to uf_was_slow(), instead of passing
   s->untracked_in_ms

 * when GIT_TEST_UF_DETAIL_WARNING tells us we are pretending a long
   delay for the purpose of running tests, ASSIGN a known value to
   s->untracked_in_ms

 * get rid of "out" and use of "sed" in these test, and instead
   check for exact output.

e.g.

	static int uf_was_slow(struct wt_status *s)
	{
		if (getenv("GIT_TEST_UF_DETAIL_WARNING"))
			s->untracked_in_ms = 3.25;
		return UF_DELAY_WARNING_IN_MS < s->untracked_in_ms;
	}

plus

	GIT_TEST_UF_DETAIL_WARNING=1 git status >actual &&
	cat >expect <<-\EOF &&
	...
	It took 3.25 seconds to enumerate ...
	EOF
	test_cmp expect actual

Also, what do you need /g modifier in "sed" script for?  I do not
think we give more than one such number in the message we are
testing.

> +		cat >expected <<-\EOF &&
> +		On branch main
> +
> +		It took X seconds to enumerate untracked files.
> +		See '"'"'git help status'"'"' for information on how to improve this.

This is not wrong per-se, but it is more customary to do say:

		See '\''git help status'\'' for information on ...

All of the comments for this test apply to other two new tests.

> +		nothing to commit, working tree clean
> +		EOF
> +		test_cmp expected actual
> +	)
> +'

Additionally (read: you do not _have_ to do this to make this topic
acceptable, but it probably is worth thinking about), if we need to
introduce a new helper function uf_was_slow() anyway, a much better
change may be to make the 2 seconds cut-off configurable, than
inventing GIT_TEST_UF_DETAIL_WARNING used only for tests.  You can
introduce, say, "status.enumerateUntrackedDelayMS", a configuration
variable that can be set to override the hardcoded 2000 milliseconds
(i.e. UF_DELAY_WARNING_IN_MS) to control what delay is acceptable
for the repository.

Then you can run the tests with the configuration set to a negative
value (i.e. no time is acceptably short, even 0 milliseconds).  If
you go that route, then you do need to redirect to "out" and redact
with "sed" (make sure you are prepared to see a delay more than 10
seconds in such a case).

Thanks.