Re: [RFC PATCH] Re: Empty directories...

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 20 Jul 2007 14:48:51 -0700 (PDT)

On Fri, 20 Jul 2007, Johan Herland wrote:
> 
> Does this mean that you are firmly opposed to the concept of storing 
> directories in the index/tree as such, or that you are only opposed to 
> (some of) the implementation ideas that have been discussed so far?

I've already sent out a *patch* to do so, for chissake. It handled all 
these cases perfectly fine, as far as I know, but I didn't test it all 
that deeply (and made it clear when I sent that patch out).

In fact, in this whole pointless discussion, I think I'm so far the only 
one to have done anything constructive at all. Sad.

So here's my standpoint:

 - people who use git natively might as well use the ".gitignore" trick. 
   It really *does* work, and there really aren't any downsides. Those 
   directories will stay around forever, until you decide that you don't 
   want them any more. Problem solved.

   Sure, if you export the git archive into some other format, you might 
   well want to do something about the ".gitignore" files (like just 
   delete them, since they won't be meaningful in an SVN environment, for 
   example, but you might also just convert them into SVN's "attributes" 
   or whatever it is that SVN uses to ignore files).

 - If you don't use git natively, but just to track another thing, you 
   could easily use the patches that I already sent out. Yes, they need 
   more testing. Yes, you'd also probably like some user interface updates 
   (notably "git add/rm" should be taught about directories).

   And yes, I probably (almost certainly) didn't handle all cases, but the 
   patch I sent out was actually a working one. It really *did* pass my 
   trivial tests.

But once you start tracking empty directories *without* a .gitignore file, 
some things fall out of that:

 - git really *really* is designed to track "snapshots in time". You 
   generate history from these snapshots. This is a very fundmanetal 
   issue, and a lot of people seem to have trouble understanding the 
   deeper implications.

   For example, git and hg may look similar, but git tracks "snapshots in 
   time", and hg tracks "file histories tied together in snapshots". That 
   really is a fundamentally different thing. 

   And one of the fundamental results of git's approach is that content is 
   content. There is *never* any notion of "history".  A snapshot really 
   is just that: it's a standalone thing. It *has* no history. The history 
   comes entirely from outside.

   This means that the whole notion of "this directory will not go away 
   because I added it explicitly" is a totally broken notion in git. It 
   has a notion of "history" - something that simply DOES NOT EXIST, 
   unless you seriously break the whole notion of "snapshots in time".

   In other words, when I say that git is a "content tracker", I'm 
   serious. It tracks nothing *but* content. If some concept doesn't exist 
   in the working tree, git doesn't track it. If it cannot be seen in the 
   filesystem, it doesn't exist.

 - Contrast this with a lot of totally broken SCM's, that track "history" 
   of files. As a result, they have absolutely *horrid* merge problems, 
   because you can no longer just merge things in the working directory, 
   and "the result" is the result. No, if you track history, you now have 
   to tell the SCM about how the *history* moved, not just the content.

So this is why git MUST NOT make the difference between

 - a directory was was created explicitly and then had a few files added 
   to it, and then had those files deleted from it

and

 - we added a few files, we removed them

The end result MUST BE the same, because the  state IN THE WORKING TREE is 
the same!

If the contents are the same, the end result must be the same. It's that 
simple. And it all comes down to: "git tracks contents".

Now, having said that, it doesn't matter *what* the end result is, as long 
as it's the same for both cases. What we do now is that when the files go 
away, the directory is no longer tracked.

But we *could* say that when we remove files, we always add back the 
directory they were in if that directory still exists in the filesystem.

See? Both are consistent with the "git tracks contents" notion. The only 
thing that is *not* consistent with that notion is to have a flag that we 
carry along that says "keep this directory". That's no longer content, and 
now you'd be tracking some internal SCM history instead. And that is a 
mistake. It may sound like a small mistake (and it is), but down that path 
lies madness. It's much better to teach people _why_ git doesn't do it, 
than to say "ok, git tracks content, but we have this special case where 
we also track something else, namely a git internal "stickiness" notion".

SCM is too important to play games with. Git gets things right, and I 
doubt people really _realize_ that the "tracks content" is why git is so 
much better, and why git can do merges so much faster and more reliably 
than anybody else.

So the rule really *must* be:

 - if two trees look the same in the filesystem, they *must* have the same 
   git SHA1, because by definition, they have the same content.

Anything that breaks that very simple statement is fundamentally broken.

			Linus

PS. I realize that nobody actually seems to be writing code, and that this 
is a "paint the bike shed" discussion for everybody else, but just in case 
there are people who don't just masturbate about the color of the shed, 
I'd like to point out that we really *do* need to enhance the "diff" rules 
too, so that you can express the changes in a tree as a diff too. Because 
if we track empty directories, then we need to be able to also *show* the 
difference between a tree that has an empty directory, and one that does 
not.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html