Jakub, this mail is too long already, and it does not make sense to tack a changed proposal to its end since then the readers will be exhausted at the time they come there. So I'll instead tack a followup to the "big picture" mail instead where I outline a modified approach which is presumably easier to understand and completely backwards-compatible, incorporating your feedback. There is probably little sense in wasting your time on a detailed response: feel free to point out where you don't see myself making sense. I have no problem with people coming to different conclusions that I do, but I would prefer it if it is not because they consider myself a raving lunatic, but because they have different opinions regarding the details. "I can follow you, but I disagree with your conclusion" is perfectly fine for now since I am going to propose something else, anyway. Thanks for the feedback. It gave me some good ideas. Jakub Narebski <jnareb@xxxxxxxxx> writes: > On Sun, 22 July 2007, David Kastrup wrote: >> Jakub Narebski <jnareb@xxxxxxxxx> writes: >>> David Kastrup wrote: >>> >>>> I must be really bad at explaining things, or I am losing a fight >>>> against preconceptions fixed beyond my imagination. > > Or you are wrong... Well, there is little reason for you to take my word on it, but I happen to have a history of designing and implementing systems where I have been responsible for every single byte, bootloader, firmware, applications, target compiler, assembler, whatever. I have been exposed to Unix and working with it several years before Linux even existed. I also have a track record of being not exactly stupid. So I pretty much can rule out that I am wrong on the factual side. But where I may be wrong is in estimating the how obvious the design can appear to others, and how useful and maintainable for others it may be in the long run. Linus says "code talks", but that's actually not half the story. If my code says that it works and the evidence is there, but nobody is able to understand _why_ it works, it has no place in a project where I am not permanently around. If smart people don't get what I am talking about, it does not matter that the patch is surprisingly well-contained: it will be a maintenance nightmare because people will never figure out why something stopped working after some particular change. >> I disagree here. The object database _can_ represent an _empty_ >> directory that has been added explicitly, because up to now no >> operations existed that actually left an empty tree. But it can't >> distinguish a _non_-empty directory that has been added explicitly >> from non-empty directory that has not been added explicitly. > > True. I forgot about that. Thanks. It is almost a revelation that anybody can agree on any point with me at the moment. > IMHO it would be best to first provide plumbing infrastructure (as > e.g. it was the case of submodule support), then add option to > git-update-index to change the "stickiness"/"autoremoval" status of > a directory (of a tree), and _last_ think about how to change the > porcelain (git-add and git-rm). Sure. It does no harm to think about reducing the amount of breaking porcelain, though. > [...] > >> And a perfectly consistent way is to make those trees with an >> explicitly added directory _non-empty_, by virtue of putting a file >> "." in them. This file, of course, exists in every physical >> directory, but we may or may not decide to let it be tracked by >> git, using the gitignore mechanism on the pattern ".". Perfectly >> expedient. > > Here we disagree. I think putting "." in a tree as marker of having > it not be automatically deleted when empty, as opposed to marking > tree using filemode in the parent, is not a good idea. Well, "not a good idea" is a far step forward from "stupid idiot babbling nonsense", so we may make progress towards actually being able to _weigh_ different options. I can actually associate with "not a good idea", not least because nobody else seems to get the idea, and that makes it infeasible for maintenance. So I'll address some points and then propose a different way of implementing what will in the end amount to rather similar semantics, but with a different view of looking at those semantics, one that corresponds well with the implementation. > The only advantage to the "." idea is that it can use gitignore > mechanism (both in-tree .gitignore, tracked or not, and info/exclude > file). But I also think that the fact that gitignore mechanism is > recursive is more of disadvantage than advantage. > > First, it is _not_ consistent. Working directory trees _always_ have > '.' in them, while trees would have or would have not it, depending > if they would be "sticky" or "autoremoved". Let me point out again that this inconsistency is already present in the difference of tracked and untracked _files_: they are always in the working directory, while trees have or not have them, depending on whether they are "registered" or "not". There is no inconsistency involved here, but it seems to make people _very_ uncomfortable to factor out the "stays around even if empty" functionality and call it "dir/." from the "can hold content" functionality which is in effect called "dir/", and basically associate tracked physical existence just with the former. The recursiveness of the gitignore mechanism has the advantage that when maintaining a large repository with actual or logical subprojects, one does not need to pick a single policy for all subprojects. I think that is quite important. It could possibly be achieved with some other method of having per-subproject configuration, but I see little wrong in using what is there and documented already. > Second, the "easy implementation" is anything but easy. "git add ." > as a way to mark directory as "sticky" is not backward compatibile: > currently it mean to add _all contents_ of current directory. > Implementation is tricky: as we have seen trying to unlink '.' or > create '.' can unfortunately succeed on [some Sun OS, and UFS > filesystem] (which follows POSIX stupidly to the letter) f**king up > the filesystem. I was not suggesting actually leaving any such calls in place: after all, they would presumably lead to error messages. But I agree that this could lead to nasty surprises when somebody with a legacy version of git worked with a repository containing "." as explicit entries of some file type. > The alternative proposal of adding "magic mode" to mark directory as > "not remove when empty" is largely tested; it is very similar to the > subproject support. Good. Because it is what I converged to last night. > Third, is contrary to the git philosophy of tracking contents. > "Stickiness" is an attribute; the fact that directory is explicitely > tracked or not does not change contents of a directory. Compare to > 'blob' which contains only contents of a file: not a filename, not a > pathname, not [subset of] filemode. > > Fourth, is very artificial. What would you put for filemode for '.'? > 040000 (i.e. directory)? Taken already. By something very artificial, namely a tree... Yes, this was a wart in my proposal. > What would you put for sha1? Sha1 of an empty directory? Some fixed value. Everywhere the same. Not really relevant. >> That basically implies that no information about directories could >> be tracked in the repository. And yes, we need appropriate >> information in the index. Again, the information whether a >> directory was added explicitly. > > Whether directory is automatically managed by git (automatically > removed or untracked). But we need directory entry in index for > git-diff, for example to recognize if there is or there is not empty > directory, or if a directory is automanaged or not. One conclusion that I have come to (and I think I am in agreement with Linus here) is that the information "empty or not" is actually useless separately: when I add files below a directory to the repository, the directory _can't_ be empty. And git has no way of knowing whether it is non-empty because I wanted the directory to be there, or whether it is non-empty because I could not have checked in the files into the tree below it otherwise. >> And the repository is a versioned and hierarchically hashed version >> of the index, but its trees contain _no_ information that is not >> already inherently represented by the files alone. [...] > > The above sentence is nonsensical. Index is helper for repository, > and can be derived from repository. Not vice versa. > > Trees do contain information which is not inherently present by the > blobs. Could you give examples for such information? As long as we are not talking about _history_, I am at a loss at what else you mean. File names and permissions? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html