On Sat, 21 Oct 2006, Jan Hudec wrote: > > [ On not moving files that weren't moved originally, but whose > directories were moved ] > > I still consider it a bug, but different problems of the file-id > solution have already been described in this thread that I consider bugs > as well. > > Besides I start to think that it should be actually possible to solve > this case with the git-style approach. It's certainly _possible_ to figure out, but one reason git does what it does is that it's just simpler (ie just ignore the whole "directory move" situation entirely, and just consider it to be "many files moved"). Another reason is that this really is an ambigious case. When the directory was moved, the file in question really didn't exist. So when it was created independently of the move, it really _is_ somewhat ambiguous whether the intention was to move it with the other files or whether the new creation point is the right one. I think that for a human, the details would likely be obvious (and I suspect that in most cases it would indeed move with the directory). But it really isn't totally clear: what does moving a directory imply for the future? Does it imply that the directory should never exist in the future, or does it just imply that the _current_ contents move? Git "tends to" have a policy of not caring about directories at all. For example, git will not track an empty directory by default. You _can_ make it track one in your commits (the data structures support it), but you're really just better of just thinking of git as tracking individual files, and nor really directories. So as far as git is concerned, "directories" mostly don't really have any existence on their own, they only exist as paths to reach files. In that kind of mindset, renaming a directory really is about renaming the files that are in that directory, and that explains the git behaviour. It may not necessarily be what you expect, but it _is_ consistent, and it's not really "wrong" either. It's just another way of looking at the thing. Also, I'd like to point out that people worry way too much about merges. There are much harder merge conflicts to fix up. If you notice that things didn't go the way you expected in a merge, even if it was done automatically, you can just do a git mv unexpected/directory/file expected/directory/file git commit --amend which basically "fixes up" the automatic merge (that's what the "--amend" means: it means "re-do the last commit with _this_ state instead). (Of course, you could also just make a separate commit to move the file, but I think the "manual fixup of the merge" is just cleaner - just add a note in the commit message to say you fixed it up by hand. When you do your "git commit --amend", it will automatically just give you an editor to edit up the commit message too while you're at it). So again: merges are certainly fairly "hard" from a SCM standpoint, but from a user standpoint, they tend to be not at all as important. I would again argue that more important than the merge itself (which you can trivially just fix up to match your expectations) is to make it easy to later _show_ what happened, ie if you examine the file later, you should be able to see where it came from. (And again, with git, things like "git pickaxe" - think of it as just a "better annotate" - will indeed pick up the similarity, regardless of whether the rename was done manually or automatically as part of the merge - exactly because git only really cares about actual contents). Btw, just to be honest: git _mostly_ thinks in terms of "constant pathname patterns" as opposed to "individual paths that move around". That's at least partly because of how I work. I actually fairly seldom look at an individual file, and tend to much more often look at a group of files, and then it's a _lot_ more convenient to do gitk drivers/usb include/linux/usb* where those argument pathnames are _not_ a set of filenames that we track, but really somethign more generic, namely a "repository pathname subset" which is constant. The above will show the _subset_ of the kernel repository history that is relevant for all the named pathnames, but the pathnames are _fixed_. It won't follow files that move out of the subdirectories: it will show the history as seen from the viewpoint of a certain subset of pathnames. This also extends to things like "git log". So when you do git log kernel/sched.c if you have a "file ID" mentality, you expect the above to follow renames. It doesn't - even though git -can- follow renames, what the above actually _means_ is "show the log for the fixed pathname set that only includes one single path". So if "kernel/sched.c" had originally been called something else, the above wouldn't show the rename at all. It would just show that "oh, this pathname suddenly was created as a new file", because from the viewpoint of that fixed pathname, that's _exactly_ what happens. We've discussed adding a "--follow" flag to tell "git log" to consider the argument to not be a "pathname filter", but a "individual file" kind of thing, and I think there was even a patch for it, but I suspect it hasn't been a big issue, probably partly because you get rather used to the "pathname filter" approach fairly quickly. If you knew what the old pathname was, for example, you could get git to _tell_ you about the rename by doing git log -M -- <set-of-all-pathnames-we're-interested-in-old-included> and git would happily see the renames that happen _within_ that pathname filter (the "-M" is there because by default "git log" doesn't show any patches at all, of course, so if you want to see the rename, you need to tell git so). As a particular example of this behaviour, if you do git log -M kernel/ you'll always see any renames that happen _within_ that subdirectory, but any files that are moved into (or out of) the subdirectory will be considered to be "create" or "delete" events - because you've literally told git to ignore all history that is not relevant to the kernel/ subdirectory (so they really _are_ "create/delete" events as far as that subdirectory is concerned). Is this different from other SCM's? Hell yes. git does a lot of things differently. Is it useful? Again, hell yes. Especially for a maintainer, the ability to talk about pathname _patterns_ is generally much more important than talking about any particular file. [ The pathname thing also means that it's trivial to ask questions like "ok, so what happened to file xyz that I _know_ we used to have, but clearly don't have any more?". You just do "git log -- xyz", and you'll see exactly what you wanted to see. The "--" here (and in a previous example) is because to avoid ambiguity, git requires that if you name files that don't actually exist, you make it clear that they are filenames, not just mistyped revision ID's or something else. ] In general, git gives you the best of both worlds. It knows how to follow individual files if you want to, but by default it uses this much more generic concept of "pathname filters". The default is definitely influenced both by my usage, and my (obviously very strong) opinions on what is more important (and thus the git "mental model"). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html