Re: [PATCH v2 1/4] diffcore-rename: compute basenames of all source and dest candidates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/9/2021 6:32 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@xxxxxxxxx>
> 
> We want to make use of unique basenames to help inform rename detection,
> so that more likely pairings can be checked first.  (src/moduleA/foo.txt
> and source/module/A/foo.txt are likely related if there are no other
> 'foo.txt' files among the deleted and added files.)  Add a new function,
> not yet used, which creates a map of the unique basenames within
> rename_src and another within rename_dst, together with the indices
> within rename_src/rename_dst where those basenames show up.  Non-unique
> basenames still show up in the map, but have an invalid index (-1).
> 
> This function was inspired by the fact that in real world repositories,
> most renames often do not involve a basename change.  Here are some
> sample repositories and the percentage of their historical renames (as of
> early 2020) that did not involve a basename change:

I found this difficult to parse. Perhaps instead

  "the percentage of their renames that preserved basenames".

We might also need something stronger, though: which percentage of renames
preserved the basename but also had no other copy of that basename in the
scope of all add/deletes?

Is this reproducible from a shell command that could be documented here?

> +MAYBE_UNUSED
> +static int find_basename_matches(struct diff_options *options,
> +				 int minimum_score,
> +				 int num_src)
> +{
> +	int i;
> +	struct strintmap sources;
> +	struct strintmap dests;
> +
> +	/* Create maps of basename -> fullname(s) for sources and dests */
> +	strintmap_init_with_options(&sources, -1, NULL, 0);
> +	strintmap_init_with_options(&dests, -1, NULL, 0);

Initially, I was wondering why we need the map for each side, but we will need
to enforce uniqueness in each set, so OK.

> +	for (i = 0; i < num_src; ++i) {
> +		char *filename = rename_src[i].p->one->path;
> +		char *base;
> +
> +		/* exact renames removed in remove_unneeded_paths_from_src() */
> +		assert(!rename_src[i].p->one->rename_used);
> +
> +		base = strrchr(filename, '/');
> +		base = (base ? base+1 : filename);

nit: "base + 1"

Also, this is used here and below. Perhaps it's worth pulling out as a
helper? I see similar code being duplicated in these existing spots:

* diff-no-index.c:append_basename()
* help.c:append_similar_ref()
* packfile.c:pack_basename()
* replace-object.c:register_replace_ref()
* setup.c:read_gitfile_gently()
* builtin/rebase.c:cmd_rebase()
* builtin/stash.c:do_create_stash()
* builtin/worktree.c:add_worktree()
* contrib/credential/gnome-keyring/git-credential-gnome-keyring.c:usage()
* contrib/credential/libsecret/git-credential-libsecret.c:usage()
* trace2/tr2_dst.c:tr2_dst_try_auto_path()

There are other places that use strchr(_, '/') but they seem to be related
to peeling basenames off of paths and using the leading portion of the path.

> +		/* Record index within rename_src (i) if basename is unique */
> +		if (strintmap_contains(&sources, base))
> +			strintmap_set(&sources, base, -1);
> +		else
> +			strintmap_set(&sources, base, i);
> +	}
> +	for (i = 0; i < rename_dst_nr; ++i) {
> +		char *filename = rename_dst[i].p->two->path;
> +		char *base;
> +
> +		if (rename_dst[i].is_rename)
> +			continue; /* involved in exact match already. */
> +
> +		base = strrchr(filename, '/');
> +		base = (base ? base+1 : filename);
> +
> +		/* Record index within rename_dst (i) if basename is unique */
> +		if (strintmap_contains(&dests, base))
> +			strintmap_set(&dests, base, -1);
> +		else
> +			strintmap_set(&dests, base, i);
> +	}
> +
> +	/* TODO: Make use of basenames source and destination basenames */
> +
> +	strintmap_clear(&sources);
> +	strintmap_clear(&dests);
> +
> +	return 0;
> +}

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux