diffcore and directory renames

Yann Dirson <ydirson@xxxxxxxxxx> · Mon, 1 Sep 2008 23:39:04 +0200

I often found myself lost when looking at a diff where a couple of
large dirs where renamed, and a handful of files were modified to take
the rename into account - not a rare situation, I'd say.  In such a
case, the diffs themselves are mostly hidden among numerous rename
entries.

To that, I felt that git ought to know better, and could easily
present a directory rename as such.  Among the advantages of having
such "dir rename detection", aside from readability which we should
not neglect, we'd have more accurate "svn dcommit" for such renames
and, and better decisions made when merging a branch with adds a file
into a dir, with a branch which renames that dir (ok, maybe not that
frequent, but still).

Looking closer at the current behaviour, I realized that git-diff-tree
*is* already able to report such renames in raw mode :

$ ./git-diff-tree 0f1027 -M
0f1027e1aceb4bc5fa682776ab9f72935e2cd1b3
:040000 040000 6f6159f0245784352414ff38ffb68bae80f30bd6 6f6159f0245784352414ff38ffb68bae80f30bd6 R100   ppc     moved

... but patch output breaks this into file moves, and even other
commands which I would have expected to display the same do in fact
not (eg. "git show --raw").  Is there any reason for this ?  I would
have expected git-show to do the same as diff-tree, and it would not
be so much of a problem to introduce a special form of the "rename"
chunk to handle dir renames ?

Since current diff-tree behaviour is based on comparing tree hashs, it
only detects 100%-similarity renames, and naturally diff-index and
diff-files cannot even do that much yet.  However, an optional
factorization pass following rename detection could quite easily
detect if, when a containing directory disappears because of a set of
renames - this is what I have started to look at.

Since what I propose is indeed just a factorization of file renames,
it could be adequate to represent them as such instead of directory
renames, so as to read something like:

|diff --git a/ppc/* b/moved/
|rename from ppc/*
|rename to moved/

This would have the additional advantage of also working when the
target destination was pre-existing, and when merging several dirs
into a single one - while at the same time to mistaking the reader too
much into thinking that git cares about directories so much ;)

Callers should also be careful about interpreting the output when
mixing this behaviour with file filtering, obviously, so it is not
meant to be activated by default.

Does any of this sound sensible ?

On the technical side, I figured out that the best way to plug the
factorization was inside diffcore_rename to take advantage of the fact
it has a list of all renames, that's not going to be published out of
this function - that only left me the "cleanup:" label as insertion
point, not that I find it conceptually very comfortable :)

But if I have understood correctly, diffcore only gets access to
changed filepairs, and does not know about the trees it is comparing.

The most sensible way to go I can think of would be to give to
diffcore enough information (src and dst tree-hash|"INDEX"|"WORKING
COPY") to be able to know for directory existence.  I'd rather check
it does not sound completely wrong to anyone before going into that -
opinions ?

Best regards,
-- 
Yann
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html