Re: [PATCH 1/4] fsmonitor: use fsmonitor data in `git diff`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Alex Vandiver via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Alex Vandiver <alexmv@xxxxxxxxxxx>
>
> With fsmonitor enabled, the first call to match_stat_with_submodule
> calls refresh_fsmonitor, incurring the overhead of reading the list of
> updated files -- but run_diff_files does not respect the
> CE_FSMONITOR_VALID flag.

run_diff_files() is used not just by "git diff" but other things
like "git add", so if we get an overall speed-up without having to
pay undue cost, that would be a very good news.

> diff --git a/diff-lib.c b/diff-lib.c
> index f95c6de75f..b7ee1b89ef 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -97,6 +97,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  
>  	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
>  
> +	refresh_fsmonitor(istate);
> +

"git diff" and friends are often run with pathspec, but the API into
the fsmonitor, refresh_fsmonitor() call, has no way to say "I only
am interested in the status of this directory and everything else
does not matter".  How expensive would this call to accept fsmonitor
data for the entire tree be, and would there eventually be a point
where the number of paths we are interested in checking (i.e. the
paths that would match the pathspec) is so small that we would be
better off not making this call?  E.g. if we are checking more than
20% of the working tree, running refresh_fsmonitor() for the entire
working tree is still a win, but if we are only checking less than
that, we are better off without fsmonitor, or does a tradeoff like
that exist?

> @@ -197,8 +199,19 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  		if (ce_uptodate(ce) || ce_skip_worktree(ce))
>  			continue;
>  
> -		/* If CE_VALID is set, don't look at workdir for file removal */
> -		if (ce->ce_flags & CE_VALID) {
> +		/*
> +		 * If CE_VALID is set, the user has promised us that the workdir
> +		 * hasn't changed compared to index, so don't stat workdir
> +		 * for file removal

The above seems to be an attempt to elaborate on the existing
comment, but ...

> +		 *  eg - via git udpate-index --assume-unchanged
> +		 *  eg - via core.ignorestat=true

... what are these two lines doing here?  It makes no sense to say
"Don't stat workdir for file removal by doing 'git update-index' or
by seetting core.ignorestat", but the placement of these two lines
makes it look as if that is what you are saying.  Perhaps

	When CE_VALID is set (via "update-index --assume-unchanged"
	or via adding paths while core.ignorestat is set to true),
	the user has promised ..., so don't stat workdir for removed
	files.

would probably be what you meant bo say.

> +		 * When using FSMONITOR:
> +		 * If CE_FSMONITOR_VALID is set, then we know the metadata on disk
> +		 * has not changed since the last refresh, and we can skip the
> +		 * file-removal checks without doing the stat in check_removed.

An iffy description.  You skip all the file-removal check by not
calling check_removed() as a whole.

This is not the fault of this patch, but in any case, the
description places too much stress on "removal" when in reality,
removal is not all that special in this codepath.  The check_removed
call also contributes to noticiing modified (not removed) files.  If
we are updating the comment here, we should correct that too,
perhaps

	When CE_VALID is set (via "update-index --assume-unchanged"
	or via adding paths while core.ignorestat is set to true),
	the user has promised that the working tree file for that
	path will not be modified.  When CE_FSMONITOR_VALID is true,
	the fsmonitor knows that the path hasn't been modified since
	we refreshed the cached stat information.  In either case,
	we do not have to stat to see if the path has been removed
	or modified.

or something like that, perhaps.

> +		 */
> +		if (ce->ce_flags & CE_VALID || ce->ce_flags & CE_FSMONITOR_VALID) {

Would it become easier to read, if written like this instead?

		if (ce->ce_flags & (CE_VALID | CE_FSMONITOR_VALID)) {

That reflects what the suggested comment says better.

>  			changed = 0;
>  			newmode = ce->ce_mode;
>  		} else {

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux