On Sun, Jul 13, 2008 at 3:24 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: <explanation of the git log traversal machinery snipped> > In order to follow renames reliably in a merge heavy history, you need to > keep track of the pathname the file you are interested in appears as _in > each commit_. As you traverse down the history, you pass down the > pathname to the parent you visit, so while you are traversing from 'x' to > earlier 'x', you will keep following "git-gui/git-gui.sh", while you > traverse down to 'o', you will inspect "git-gui.sh". > > The data structure the revision traversal machinery uses does not support > this "path-per-commit" natively. Would it be possible to go for a slightly less complicated approach and instead of passing replacing the tracked file, append it? We already have a list of files we are tracking, so I assume the data structure does support that. Such would run with the risk of tracking too much (e.g., you rename a.txt => b.txt, and then later on create/rename a new a.txt which is then tracked as well). > This is the reason "git-blame" uses its own traversal engine. It keeps > track of <commit, path> pairs so that it can mark which line came from > what path in what commit. When copy/move detection are used, we can even > notice that the contents we are interested in came from more than one file > in the same commits, and the data structure supports it (i.e. it is not > just a pointer to a single string from "struct commit"). So what could be done is use a blame-like mechanism that invokes rename detection on each interesting commit and then record that information? Purely hypothetical though, since I know neither and have no time to do so. > For the purpose of "git log" traversal and the "file renames" people > usually talk about, this is overkill; you should however be able to > backport the basic idea to revision machinery, if you really cared. Right, that'd teach git log how to follow across renames in an intelligent manner that works also for non-linear histories at the cost of using up more memory and cpu? > In a real history, "file rename" is a very ill defined concept and is not > always useful in practice. I did a fairly detailed analysis on one > real-world history more than two years ago, which is found here: > > http://thread.gmane.org/gmane.comp.version-control.git/13746/focus=13769 Aye, I agree that a 'rename' is hard to define and that a lot of effort could be put into supporting 'renames' that are not trivial (e.g., more complex than 'git mv foo.txt bar.txt'). > In our own "git.git" history, the evolution of what finally landed in > revision.c is interesting. The interesting part of content movement never > involved any file renames --- only bits and pieces migrated over across > many files. That is not something "file rename tracking", even with an > extension to the revision traversal machinery to keep one path per commit > to record the file you are interested in, can ever give meaningful > explanation of the history. You need a lot more fine grained "blame" > traversal machinery for that. This makes sense, but it (using blame traversal machinery) is overkill for what I am interested in. What I think would be a good goal in supporting is the subtree merge strategy. It would be nice if 'git log --follow-subtree-merge refspec -- filefilter' or such would Just Work (TM). Perhaps that the hunk-tracking I am working on with Dscho could help make 'git log --numstat' more accurate. Those two combined (git log being able to follow across subtree merges and being able to recognise hunks being moved) would be all that I need. -- Cheers, Sverre Rabbelier -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html