Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> writes: > On 9/29/2017 7:12 PM, Johannes Schindelin wrote: > >> Therefore, it would be good to have a way to tell Git about renames >> explicitly so that it does not even need to use its heuristics. > > Agreed. > > It would be nice if every file (and tree) had a permanent GUID > associated with it. Then the filename/pathname becomes a property > of the GUIDs. Then you can exactly know about moves/renames with > minimal effort (and no guessing). I actually like the idea to have a mechanism where the user can give hint to influence, or instruction to dictate, how Git determines "this old path moved to this new path" when comparing two trees. A human would not consider a new file (e.g. header file) that begins with a few dozen commonly-seen boilerplate lines (e.g. copyright statement) followed by several lines unique to the new contents to be a rename of a disappearing old file that begins with the same boilerplate followed by several lines that are different from what is in the new file, but Git's algorithm would give equal weight to all of these lines when deciding how similar the new file is to the old file, and can misidentify a new file to be a rename of an old file that is unrelated. Even when Git can and does determine the pairing correctly, it would be a win if we do not have to recompute the same pairing every time. So both as hint and as cache, such a mechanism would make sense [*1*]. But "file ID" does not have any place to contribute to such a mechanism. Each of two developers working on the same project in a disributed environment can grab the same gist and create a new file in his or her tree, perhaps at the same path or at a different path. At the time of such an addition, there is no way for each of them to give these two files the same "file ID" (that is how the world works in the distributed environment after all)---which "file ID" should survive when their two histories finally meet and results in a single file after a merge? A file with "file ID" may not be renamed but may be copied and evolve separately and differently. Which one should inherit its original "file ID" and how does having "file ID" help us identify the other one is equally related to the original file? These two are merely examples that "file ID"s would cause while solving "only" what can be expressed in "git diff -M" output (the latter illustrates that it does not even help showing "git diff -C"). And when we stop limiting ourselves to the whole-file renames and copies (which can be expressed in "git diff" output) but also want to help finer-grained operation like "git blame", we'd want to have something that helps in situations like a single file's contents split into multiple files and multiple files' contents concatenated into a single new file, both of which happens during code refactoring. "file ID" would not contribute an iota in helping these situations. I've said this number of times, and I'll say this again, but one of the most important message in our list archive is gmane:217 aka https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.7211@xxxxxxxxxxxxxxx/ I'd encourge people to read and re-read that message until they can recite it by heart. Linus mentions "CVS annotate"; the message was written long before we had "git blame", and it served as a guide when desiging how we dig contents movement in various parts of the system. [Footnote] *1* There are many possible implementations; the most obvious would be to record a pair of blob object names and instruct Git when it seems one side of a pair disappearing and the other side of the pair appearing, take the pair as a rename. And that would be sufficient for "git log -M". Such a cache/hint alone however would not help much in "git merge" without further work, as we merge using only the tree state of the three points in the history (i.e. the common ancestor and two tips). merge-recursive needs to be taught to find the renames at each commit it finds throughout the history from the ancestor and each tip and carry its finding through if it wants to take advantage of such hint/cache.