Re: [PATCH 2/2] diffcore-rename: filter rename_src list when possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> +static int remove_unneeded_paths_from_src(int num_src,
> +					  int detecting_copies)
> +{
> +	int i, new_num_src;
> +
> +	/*
> +	 * Note on reasons why we cull unneeded sources but not destinations:
> +	 *   1) Pairings are stored in rename_dst (not rename_src), which we
> +	 *      need to keep around.  So, we just can't cull rename_dst even
> +	 *      if we wanted to.  But doing so wouldn't help because...
> +	 *
> +	 *   2) There is a matrix pairwise comparison that follows the
> +	 *      "Performing inexact rename detection" progress message.
> +	 *      Iterating over the destinations is done in the outer loop,
> +	 *      hence we only iterate over each of those once and we can
> +	 *      easily skip the outer loop early if the destination isn't
> +	 *      relevant.  That's only one check per destination path to
> +	 *      skip.
> +	 *
> +	 *      By contrast, the sources are iterated in the inner loop; if
> +	 *      we check whether a source can be skipped, then we'll be
> +	 *      checking it N separate times, once for each destination.
> +	 *      We don't want to have to iterate over known-not-needed
> +	 *      sources N times each, so avoid that by removing the sources
> +	 *      from rename_src here.
> +	 */
> +	if (detecting_copies)
> +		return num_src; /* nothing to remove */
> +	if (break_idx)
> +		return num_src; /* culling incompatbile with break detection */
> +
> +	for (i = 0, new_num_src = 0; i < num_src; i++) {
> +		/*
> +		 * renames are stored in rename_dst, so if a rename has
> +		 * already been detected using this source, we can just
> +		 * remove the source knowing rename_dst has its info.
> +		 */
> +		if (rename_src[i].p->one->rename_used)
> +			continue;
> +
> +		if (new_num_src < i)
> +			memcpy(&rename_src[new_num_src], &rename_src[i],
> +			       sizeof(struct diff_rename_src));
> +		new_num_src++;
> +	}
> +
> +	return new_num_src;
> +}

Essentially we are compacting rename_src[] array from num_src
elements down to new_num_src elements; we are losing pointers, but I
presume these are all borrowed pointers that we do not own and we
are not responsible for freeing?  If we were to free them, the
compaction would leave duplicates after the new tail (new_num_src)
and we'd end up having to worry about double-freeing, so hopefully
all we need to do is just to free the entire array of pointers, and
not the pointees.

Having to do this just once and being able to reduce the number of
entries we need to iterate over does sound like a good simple
optimization.

>  void diffcore_rename(struct diff_options *options)
>  {
>  	int detect_rename = options->detect_rename;
> @@ -463,10 +512,11 @@ void diffcore_rename(struct diff_options *options)
>  	struct diff_score *mx;
>  	int i, j, rename_count, skip_unmodified = 0;
>  	int num_destinations, dst_cnt;
> -	int num_sources;
> +	int num_sources, want_copies;
>  	struct progress *progress = NULL;
>  
>  	trace2_region_enter("diff", "setup", options->repo);
> +	want_copies = (detect_rename == DIFF_DETECT_COPY);
>  	if (!minimum_score)
>  		minimum_score = DEFAULT_RENAME_SCORE;
>  
> @@ -529,13 +579,10 @@ void diffcore_rename(struct diff_options *options)
>  		goto cleanup;
>  
>  	/*
> -	 * Calculate how many renames are left (but all the source
> -	 * files still remain as options for rename/copies!)
> +	 * Calculate how many renames are left
>  	 */
>  	num_destinations = (rename_dst_nr - rename_count);
> -	num_sources = rename_src_nr;
> -	if (detect_rename != DIFF_DETECT_COPY)
> -		num_sources -= rename_count;
> +	num_sources = remove_unneeded_paths_from_src(rename_src_nr, want_copies);

OK, this is in a sense an extended version of the previous step.

I am not sure if rename_src_nr can be left out of sync with reality
like this patch does, though.  The reference to that variable in
register_rename_src() and find_exact_renames() are OK as we are not
going to call them after we futz with the rename_src[] array, but
the reference in prefetch(), which does not actually happen early
but only when we start running estimate_similarity(), which is after
we compacted the rename_src[] array, would be affected, no?

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux