On Mon, Oct 2, 2017 at 11:51 AM, Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: > Sorry to re-re-...-re-stir up such an old topic. > > I wasn't really thinking about commit-to-commit hints. > I think these have lots of problems. (If commit A->B does > "t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*", > then you need a way to compute a transitive closure to see > the net-net hints for A->C. I think that quickly spirals > out of control.) I agree. Though as a human I can still look at A..C giving the hint that t/*.c and xyz/*.c ought to be taken into account for rename detection. (which is currently done with -M -C --find-copies-harder as a generic "there are renamed things", and not the very specific rule, that may be cheaper to examine compared to these generic rules) > No, I was going in another direction. For example, if a > tree-entry contains { file-guid, file-name, file-sha, ... } > then when diffing any 2 commits, you can match up files > (and folders) by their guids. Renames pop out trivially when > their file-names don't match. File moves pop out when the > file-guids appear in different trees. Adds and deletes pop > out when file-guids don't have a peer. (I'm glossing over some > of the details, but you get the idea.) How do you know when a guid needs adaption? (c.f. origin/jt/packmigrate) If a commit moves a function out of a file into a new file, the ideal version control could notice that the function was moved into a new file and still attribute the original authors by ignoring the move commit. Another series in flight could have modified that function slightly (fixed a bug), such that it's hard to reason about these things. For guids I imagine the new file gets a new guid, such that tracking the function becomes harder? > To address Junio's > question, independently added files with the same name will > have 2 different file-guids. We amend the merge rules to > handle this case and pick one of them (say, the one that > is sorts less than the other) as the winner and go on. > All-in-all the solution is not trivial (as there are a few > edge cases to deal with), but it better matches the (casual) > user's perception of what happened to their tree over time. The GUID would be made up at creation time, I assume? Is there any input other than the file itself? (I assumed so initially, such that: By having a GUID in the tree, we would divorce from the notion of a "content addressable file system" quickly, as we both could create the same tree locally (containing the same blobs) and yet the trees would have different names due to having different GUIDs in them ), which I'd find undesirable. > It also doesn't require expensive code to sniff for renames > on every command (which doesn't scale on really large repos). I wonder if the rename detection could be offloaded to a server (which scales) that provides a "hint file" to clients, such that the clients can then cheaply make use of these specific hints.