Re: [PATCH v2 06/13] merge-ort: add data structures for in-memory caching of rename detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/3/21 10:12 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@xxxxxxxxx>
> 
> When there are many renames between the old base of a series of commits
> and the new base for a series of commits, the sequence of merges
> employed to transplant those commits (from a cherry-pick or rebase
> operation) will repeatedly detect the exact same renames.  This is
> wasted effort.
> 
> Add data structures which will be used to cache rename detection
> results, along with the initialization and deallocation of these data
> structures.  Future commits will populate these caches, detect the
> appropriate circumstances when they can be used, and employ them to
> avoid re-detecting the same renames repeatedly.

I appreciate the definitions and boilerplate for these data
structures being isolated to their own patch.

> @@ -140,6 +140,37 @@ struct rename_info {
>  	int callback_data_nr, callback_data_alloc;
>  	char *callback_data_traverse_path;
>  
> +	/*
> +	 * cached_pairs: Caching of renames and deletions.
> +	 *
> +	 * These are mappings recording renames and deletions of individual
> +	 * files (not directories).  They are thus a map from an old
> +	 * filename to either NULL (for deletions) or a new filename (for
> +	 * renames).
> +	 */
> +	struct strmap cached_pairs[3];
> +
> +	/*
> +	 * cached_target_names: just the destinations from cached_pairs
> +	 *
> +	 * We sometimes want a fast lookup to determine if a given filename
> +	 * is one of the destinations in cached_pairs.  cached_target_names
> +	 * is thus duplicative information, but it provides a fast lookup.
> +	 */
> +	struct strset cached_target_names[3];

These two work well together. Very clear.

> +	/*
> +	 * cached_irrelevant: Caching of rename_sources that aren't relevant.
> +	 *
> +	 * cached_pairs records both renames and deletes.  Sometimes we
> +	 * do not know if a path is a rename or a delete because we pass
> +	 * RELEVANT_LOCATION to diffcore_rename_extended() and based on
> +	 * various optimizations it returns without detecting whether that
> +	 * path is actually a rename or a delete.  We need to cache such
> +	 * paths too, but separately from cached_pairs.
> +	 */
> +	struct strset cached_irrelevant[3];

I'm having a hard time parsing what these "irrelevant" paths will be.
It seems like diffcore_rename_extended() will report something other
than "rename" or "delete" for some paths. Could we explicitly mark
that state as "irrelevant"?

	/*
	 * cached_irrelevant: Caching of rename_sources that aren't relevant.
	 *
	 * cached_pairs records both renames and deletes.  Sometimes we
	 * do not know if a path is a rename or a delete because we pass
	 * RELEVANT_LOCATION to diffcore_rename_extended() which might
	 * describe a path as "irrelevant" instead of as a "rename" or "delete".
	 *  We need to cache such paths too, but separately from cached_pairs.
	 */

Does this make sense? diffcore_rename_extended() might need an update
to match this extra, explicit state.

The rest of the code looks good.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux