On 5/3/21 10:12 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@xxxxxxxxx> > > When there are many renames between the old base of a series of commits > and the new base for a series of commits, the sequence of merges > employed to transplant those commits (from a cherry-pick or rebase > operation) will repeatedly detect the exact same renames. This is > wasted effort. > > Add data structures which will be used to cache rename detection > results, along with the initialization and deallocation of these data > structures. Future commits will populate these caches, detect the > appropriate circumstances when they can be used, and employ them to > avoid re-detecting the same renames repeatedly. I appreciate the definitions and boilerplate for these data structures being isolated to their own patch. > @@ -140,6 +140,37 @@ struct rename_info { > int callback_data_nr, callback_data_alloc; > char *callback_data_traverse_path; > > + /* > + * cached_pairs: Caching of renames and deletions. > + * > + * These are mappings recording renames and deletions of individual > + * files (not directories). They are thus a map from an old > + * filename to either NULL (for deletions) or a new filename (for > + * renames). > + */ > + struct strmap cached_pairs[3]; > + > + /* > + * cached_target_names: just the destinations from cached_pairs > + * > + * We sometimes want a fast lookup to determine if a given filename > + * is one of the destinations in cached_pairs. cached_target_names > + * is thus duplicative information, but it provides a fast lookup. > + */ > + struct strset cached_target_names[3]; These two work well together. Very clear. > + /* > + * cached_irrelevant: Caching of rename_sources that aren't relevant. > + * > + * cached_pairs records both renames and deletes. Sometimes we > + * do not know if a path is a rename or a delete because we pass > + * RELEVANT_LOCATION to diffcore_rename_extended() and based on > + * various optimizations it returns without detecting whether that > + * path is actually a rename or a delete. We need to cache such > + * paths too, but separately from cached_pairs. > + */ > + struct strset cached_irrelevant[3]; I'm having a hard time parsing what these "irrelevant" paths will be. It seems like diffcore_rename_extended() will report something other than "rename" or "delete" for some paths. Could we explicitly mark that state as "irrelevant"? /* * cached_irrelevant: Caching of rename_sources that aren't relevant. * * cached_pairs records both renames and deletes. Sometimes we * do not know if a path is a rename or a delete because we pass * RELEVANT_LOCATION to diffcore_rename_extended() which might * describe a path as "irrelevant" instead of as a "rename" or "delete". * We need to cache such paths too, but separately from cached_pairs. */ Does this make sense? diffcore_rename_extended() might need an update to match this extra, explicit state. The rest of the code looks good. Thanks, -Stolee