On Thu, Aug 02, 2012 at 07:08:25PM +0700, Nguyen Thai Ngoc Duy wrote: > > I implemented (1a). Implementing (1b) would be easy, but for a full-on > > cache (especially for "-C"), I think the resulting size might be > > prohibitive. > > (1a) is good regardless rename overrides. Why don't you polish and > submit it? We can set some criteria to limit the cache size while > keeping computation reasonably low. Caching rename scores for file > pairs that has file size larger than a limit is one. Rename matrix > size could also be a candidate. We could even cache just rename scores > for recent commits (i.e. close to heads) only with the assumption that > people diff/apply recent commits more often. I'll polish and share it. I'm still not 100% sure it's a good idea, because introducing an on-disk cache means we need to _manage_ that cache. How big will it be? Who will prune it when it gets too big? By what criteria? And so on. But if it's all hidden behind a config option, then it won't hurt people who don't use it. And people who do use it can gather data on how the caches grow. > > All solutions under (2) suffer from the same problem: they are accurate > > only for a single diff. For other diffs, you would either have to not > > use the feature, or you would be stuck traversing the history and > > assigning a temporary file identity (e.g., given commits A->B->C, and in > > A->B we rename "foo" to "bar", the diff between A and C could discover > > that A's "foo" corresponds to C's "bar"). > > Yeah. If we go with manual overrides, I expect users to deal with > these manually too. IOW they'll need to create a mapping for A->C > themselves. We can help detect that there are manual overrides in some > cases, like merge, and let users know that manual overrides are > ignored. For merge, I think we can just check for all commits while > traversing looking for bases. Yeah, merges are a special case, in that we know the diff we perform will always have a direct-ancestor relationship (since it is always between a tip and the merge base). > > But there is not much point in making it machine-readable, since the > > interesting machine-readable things we do with renames are: > > > > 1. Show the diff against the rename src, which can often be easier to > > read. Except that if rename detection did not find it, it is > > probably _not_ going to be easier to read. > > Probably. Still it helps "git log --follow" to follow the correct > track in the 1% case that rename detection does go wrong. Thanks. I didn't think of --follow, but that is a good counterpoint to my argument. > > 2. Applying content to the destination of a merge. But you're almost > > never doing the diff between a commit and its parent, so the > > information would be useless. > > Having a way to interfere rename detection, even manually, could be > good in this case if it reduces conflicts. We could feed rename > overrides using command line. Yeah. I think I'd start with letting you feed pairs to diff_options, give it a command-line option to see how useful it is, and then later on consider a mechanism for extracting those pairs automatically from commits or notes. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html