On Sun, 26 Feb 2012, Federico Galassi wrote: > On 26/feb/2012, at 12:29, Jakub Narebski wrote: > >> Would you mind if this discussion was moved to git mailing >> list (git@xxxxxxxxxxxxxxx), of course always with copy directly >> to you? There are people there that can answer your questions >> better. > > No problem. > >> On Sun, 26 Feb 2012, Federico Galassi wrote: >>> Hello, i think you're the author of these comments: >>> http://news.ycombinator.com/item?id=616610 >>> >>> I'm doing educational work on git based on the parable (talks, >>> articles, etc..) and i'd like to improve on the real reason >>> for a staging area. >>> >>> My question basically is: why is it really needed for merging? >>> I mean, given the fictional git-like system of the parable, >>> if I need to merge 2 snapshots i could: >>> >>> 1) search the commit tree for a base point [...] >>> 2) compare the diffs between the snapshots and the base point snapshot >>> 3) if a conflict happens (change in the same line), just leave >>> something in the working dir to mark the conflict. For example, >>> keeping it simple, the system could reject a new commit until >>> the markers of the conflict are removed from the conflicting file. >>> >>> Couldn't it just work this way? >> >> Well, it could; that is how many if not most of other version control >> systems work. >> >> >> There are (at least!) three problems with that approach. First, sometimes >> it is not possible to "leave something in the working dir to mark the >> conflict". Take for example case where binary file (e.g. image) was >> changed, and textual 3-way diff file-merge algorithm wouldn't work. >> >> Second, what to do in the case of *tree-level* conflict, for example >> rename/rename conflict, where one side renamed file to different >> name (moved to different place) than the other side. There are no >> conflict markers for this... >> >> Third, what about false positives with detecting conflict markers, >> i.e. the case where "rejecting new commit until conflict markers are >> removed", for example AsciiDoc files can be falsely detected as having >> partial conflict markers, and of course test vectors for testing conflict >> would have to have conflict markers in them. > > Ok, it's clear to me that the markers in file approach is just a little > bit too simple. Do you see any concrete advantage in the staging area > compared to, say, tree conflict metadata in the working dir and maybe > a dedicated smart "resolve conflict" command? First, for such _local_ information working directory isn't the best place. What if you accidentally delete this? It is not and should not be committed to repository,so there is no way to undelete it, except redoing merge and losing all your progress so far in resolving merge conflicts. It is much better to put such information somewhere in administrative area[1] of repository. Second, if we have staging area where we store information about which files are tracked, and a bunch of per-file metadata like modification time for better performance, why not use it also for storing information about merge in progress? [1]: Name taken from "Version Control by Example" (free e-book) by Eric Sink. There is also a thing very specific to Git, namely that "git add" adds a current content of a file to object database of a repository (though with modern git there is also "git add --intent-to-add" which works like add-ing file in other version control systems)... and you have to store reference to newly created object somewhere so that it doesn't get garbage-collected. >>> Can you mention other situations in which the pattern "files to be added" >>> is either mandatory or really helpful? >> >> Note that any version control system must have a kind of proto-staging >> area to know which files are to be added in next commit. >> >> If you do >> >> $ scm add file.c >> >> then version control system must save somewhere that 'file.c' is to be >> tracked (to be added in next commit). > > Yes, the fictional vcs just tracked all the files in the working dir. > Being selective on which file to track is of course another interesting > feature. IRL it is a _necessary_ feature. One of more common, if not most common application of version control system is to manage source files for a computer program. And there you have object files, executables and other _generated_ files which shouldn't be put in version control, not to mention backups created by your editor / IDE (e.g. "*~" files in Unix world, "*.bak" files in MS Windows world). Not to mention files which you have added to working directory, but are not ready to be added to new commit. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html