Re: [PATCH v2] checkout: avoid unnecessary match_pathspec calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> In checkout_paths() we do this
>
>  - for all updated items, call match_pathspec
>  - for all items, call match_pathspec (inside unmerge_cache)
>  - for all items, call match_pathspec (for showing "path .. is unmerged)
>  - for updated items, call match_pathspec and update paths
>
> That's a lot of duplicate match_pathspec(s) and the function is not
> exactly cheap to be called so many times, especially on large indexes.
> This patch makes it call match_pathspec once per updated index entry,
> save the result in ce_flags and reuse the results in the following
> loops.
>
> The changes in 0a1283b (checkout $tree $path: do not clobber local
> changes in $path not in $tree - 2011-09-30) limit the affected paths
> to ones we read from $tree. We do not do anything to other modified
> entries in this case, so the "for all items" above could be modified
> to "for all updated items". But..
>
> The command's behavior now is modified slightly: unmerged entries that
> match $path, but not updated by $tree, are now NOT touched.  Although
> this should be considered a bug fix, not a regression.

Could we have a test to show the difference please, especially if we
are going to sell this as a fix?

The change itself looks quite sane to me (I didn't apply or test it,
though---just eyeballing).

Thanks.

>
> And while at there, free ps_matched after use.
>
> The following command is tested on webkit, 215k entries. The pattern
> is chosen mainly to make match_pathspec sweat:
>
> git checkout -- "*[a-zA-Z]*[a-zA-Z]*[a-zA-Z]*"
>
>         before      after
> real    0m3.493s    0m2.737s
> user    0m2.239s    0m1.586s
> sys     0m1.252s    0m1.151s
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
> ---
>  > and worry about larger ones later, so if there were another choice,
>  > i.e.
>  >
>  >  - eject nd/magic-pathspecs for now, cook this (and other small
>  >    independent and clear improvements you may come up with, some of
>  >    which might come out of nd/magic-pathspecs itself) and let it
>  >    graduate first, and later revisit rerolld nd/magic-pathspecs
>  >
>  > that would be the ideal among the given choices ;-).
>
>  Whichever is easier for you.
>
>  > The above is a faithful rewrite, but I have to wonder why you need
>  > two separate loops.
>  >
>  > Do you understand what the original loop is doing with ps_matched,
>  > and why the code excludes certain paths while doing so?
>
>  Nope, I did not dig that deep. I expected you to do it ;-) j/k
>
>  > After commenting on the above, it makes me wonder if we even need to
>  > bother marking entries that were in the index that did not come from
>  > the tree-ish we are checking paths out of, though.  What breaks if
>  > you did not do the rewrite above and dropped the second loop in your
>  > patch?
>
>  The test suite says none. There is a behavior change regarding
>  unmerged entries as mentioned in the commit message. But I think it's
>  a good change.
>
>  builtin/checkout.c | 34 +++++++++++++++++++++++++++-------
>  cache.h            |  1 +
>  resolve-undo.c     | 19 ++++++++++++++++++-
>  resolve-undo.h     |  1 +
>  4 files changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index a9c1b5a..359b983 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -271,24 +271,46 @@ static int checkout_paths(const struct checkout_opts *opts,
>  		;
>  	ps_matched = xcalloc(1, pos);
>  
> +	/*
> +	 * Make sure all pathspecs participated in locating the paths
> +	 * to be checked out.
> +	 */
>  	for (pos = 0; pos < active_nr; pos++) {
>  		struct cache_entry *ce = active_cache[pos];
> +		ce->ce_flags &= ~CE_MATCHED;
>  		if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
> +			/*
> +			 * "git checkout tree-ish -- path", but this entry
> +			 * is in the original index; it will not be checked
> +			 * out to the working tree and it does not matter
> +			 * if pathspec matched this entry.  We will not do
> +			 * anything to this entry at all.
> +			 */
>  			continue;
> -		match_pathspec(opts->pathspec, ce->name, ce_namelen(ce), 0, ps_matched);
> +		/*
> +		 * Either this entry came from the tree-ish we are
> +		 * checking the paths out of, or we are checking out
> +		 * of the index.
> +		 */
> +		if (match_pathspec(opts->pathspec, ce->name, ce_namelen(ce),
> +				   0, ps_matched))
> +			ce->ce_flags |= CE_MATCHED;
>  	}
>  
> -	if (report_path_error(ps_matched, opts->pathspec, opts->prefix))
> +	if (report_path_error(ps_matched, opts->pathspec, opts->prefix)) {
> +		free(ps_matched);
>  		return 1;
> +	}
> +	free(ps_matched);
>  
>  	/* "checkout -m path" to recreate conflicted state */
>  	if (opts->merge)
> -		unmerge_cache(opts->pathspec);
> +		unmerge_marked_index(&the_index);
>  
>  	/* Any unmerged paths? */
>  	for (pos = 0; pos < active_nr; pos++) {
>  		struct cache_entry *ce = active_cache[pos];
> -		if (match_pathspec(opts->pathspec, ce->name, ce_namelen(ce), 0, NULL)) {
> +		if (ce->ce_flags & CE_MATCHED) {
>  			if (!ce_stage(ce))
>  				continue;
>  			if (opts->force) {
> @@ -313,9 +335,7 @@ static int checkout_paths(const struct checkout_opts *opts,
>  	state.refresh_cache = 1;
>  	for (pos = 0; pos < active_nr; pos++) {
>  		struct cache_entry *ce = active_cache[pos];
> -		if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
> -			continue;
> -		if (match_pathspec(opts->pathspec, ce->name, ce_namelen(ce), 0, NULL)) {
> +		if (ce->ce_flags & CE_MATCHED) {
>  			if (!ce_stage(ce)) {
>  				errs |= checkout_entry(ce, &state, NULL);
>  				continue;
> diff --git a/cache.h b/cache.h
> index c56315c..04e6090 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -161,6 +161,7 @@ struct cache_entry {
>  
>  #define CE_UNPACKED          (1 << 24)
>  #define CE_NEW_SKIP_WORKTREE (1 << 25)
> +#define CE_MATCHED           (1 << 26)
>  
>  /*
>   * Extended on-disk flags
> diff --git a/resolve-undo.c b/resolve-undo.c
> index 72b4612..639eb9c 100644
> --- a/resolve-undo.c
> +++ b/resolve-undo.c
> @@ -118,7 +118,7 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
>  	struct cache_entry *ce;
>  	struct string_list_item *item;
>  	struct resolve_undo_info *ru;
> -	int i, err = 0;
> +	int i, err = 0, matched;
>  
>  	if (!istate->resolve_undo)
>  		return pos;
> @@ -137,6 +137,7 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
>  	ru = item->util;
>  	if (!ru)
>  		return pos;
> +	matched = ce->ce_flags & CE_MATCHED;
>  	remove_index_entry_at(istate, pos);
>  	for (i = 0; i < 3; i++) {
>  		struct cache_entry *nce;
> @@ -144,6 +145,8 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
>  			continue;
>  		nce = make_cache_entry(ru->mode[i], ru->sha1[i],
>  				       ce->name, i + 1, 0);
> +		if (matched)
> +			nce->ce_flags |= CE_MATCHED;
>  		if (add_index_entry(istate, nce, ADD_CACHE_OK_TO_ADD)) {
>  			err = 1;
>  			error("cannot unmerge '%s'", ce->name);
> @@ -156,6 +159,20 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
>  	return unmerge_index_entry_at(istate, pos);
>  }
>  
> +void unmerge_marked_index(struct index_state *istate)
> +{
> +	int i;
> +
> +	if (!istate->resolve_undo)
> +		return;
> +
> +	for (i = 0; i < istate->cache_nr; i++) {
> +		struct cache_entry *ce = istate->cache[i];
> +		if (ce->ce_flags & CE_MATCHED)
> +			i = unmerge_index_entry_at(istate, i);
> +	}
> +}
> +
>  void unmerge_index(struct index_state *istate, const char **pathspec)
>  {
>  	int i;
> diff --git a/resolve-undo.h b/resolve-undo.h
> index 8458769..7a30206 100644
> --- a/resolve-undo.h
> +++ b/resolve-undo.h
> @@ -12,5 +12,6 @@ extern struct string_list *resolve_undo_read(const char *, unsigned long);
>  extern void resolve_undo_clear_index(struct index_state *);
>  extern int unmerge_index_entry_at(struct index_state *, int);
>  extern void unmerge_index(struct index_state *, const char **);
> +extern void unmerge_marked_index(struct index_state *);
>  
>  #endif
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]