Re: [PATCH v3 3/5] diffcore-rename: complete find_basename_matches()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> +	/* Now look for basename matchups and do similarity estimation */
> +	for (i = 0; i < num_src; ++i) {
> +		char *filename = rename_src[i].p->one->path;
> +		const char *base = NULL;
> +		intptr_t src_index;
> +		intptr_t dst_index;
> +
> +		/* Find out if this basename is unique among sources */
> +		base = get_basename(filename);
> +		src_index = strintmap_get(&sources, base);
> +		if (src_index == -1)
> +			continue; /* not a unique basename; skip it */
> +		assert(src_index == i);
> +
> +		if (strintmap_contains(&dests, base)) {
> +			struct diff_filespec *one, *two;
> +			int score;
> +
> +			/* Find out if this basename is unique among dests */
> +			dst_index = strintmap_get(&dests, base);
> +			if (dst_index == -1)
> +				continue; /* not a unique basename; skip it */

It would be a lot easier to read if "we must have the same singleton
in dests" in a single if condition, I suspect.  I.e.

		if (strintmap_contains(&dests, base) &&
		    0 <= (dst_index = (strintmap_get(&dests, base)))) {

It is a bit sad that we iterate over rename_src[] array, even though
we now have a map that presumably have fewer number of entries than
the original array, though.

> +			/* Ignore this dest if already used in a rename */
> +			if (rename_dst[dst_index].is_rename)
> +				continue; /* already used previously */

Since we will only be matching between unique entries in src and
dst, this "this has been used, so we cannot use it" will not change
during this loop.  I wonder if the preparation done in the previous
step, i.e. [PATCH v3 2/5], can take advantage of this fact, i.e.  a
dst that has already been used (in the previous "exact" step) would
not even have to be in &dests map, so that the strintmap_contains()
check can reject it much earlier.

Stepping back a bit, it appears to me that [2/5] and [3/5] considers
a source file having unique basename among the sources even if there
are many such files with the same basename, as long as all the other
files with the same basename have been matched in the previous
"exact" phase.  It probably does the same thing for destination
side.

Intended?

It feels incompatible with the spirit of these two steps aim for
(i.e. only use this optimization on a pair of src/dst with UNIQUE
basenames).  For the purpose of "we only handle unique ones", the
paths that already have matched should participate in deciding if
the files that survived "exact" phase have unique basename among
the original inpu?

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux