I’ve just learned Git. What a wonderful system, thanks for building it. And what an annoying learning experience. I promised myself to try to remember what made it all so hard, and to write it down in a comprehensive and possibly even constructive fashion. Here it is, for what it’s worth. Read it as the friendly, but somewhat exasparated suggestions of a newcomer. I’d love to help (in the form of submitting patches to the documentation or CLI responses), but I’d like to test the waters first. So, in no particular order, here are the highlights of my former confusion, if only for your entertainment. Comments are welcome, in particular where my suggestions are born out of ignorance. Remote (tracking) branches -------------------------- There are at least two uses of the word *tracking* in Git's terminology. The first, used in the form “git tracks a file” (in the sense that Git knows about the file) is harmless enough, and is handled under `git add` below. But the real monster is the *tracking branch*, sometimes called the remote branch, the remote-tracking branch, or the remote tracking branch. Boy did that ever confuse me. And, reading the git mailing list and the web, many others. There are so many things wrong with how this simple concept is obfuscated by the documentation that I have a hard time organising my thoughts about writing it down. Please, *please* fix this. It was the single most confusing and annoying part of learning Git. First, the word, “tracking”. These branches don’t track or follow anything. They are standing completely still. Please believe me that when first you are led to believe that origin/master tracks a branch on the remote (like a hound tracks it quarry, or a radar tracks a flight) that it is very difficult to hunt this misunderstanding down: I believed for a long time that the tracking branch stayed in sync, automagically, with a synonymous branch at the remote. The CLI and documentation worked very hard to keep me in that state of ignorance. I *know* that my colleague just updated the remote repository, yet the remote branch (or is the remote tracking branch? or the remote-tracking branch?) is as it always was...? (How could I *ever* believe that? Well, *now* I get it, and have a difficult time recollecting that misunderstanding. *Now* it’s easy.) Second, the word “remote” as opposed to “local”, a dichotomy enforced by both the documentation and by the output of `git branch -r` (list all remote branches, says user-manual.txt). Things began to dawn on me only when I understood that origin/master is certainly and absolutely a “local” branch, in the sense that it points to a commit in my local repository. (It differs from my other local branches mainly in how it is updated. It’s not committed to, but fetched to. But both are local, and the remote can be many commits ahead of me.) So, remote tracking branches are neither remote (they are *local* copies of how the remote once was) and they stand completely still until you tell them to “fetch”. So remote means local, and tracking means still, “local still-standing” would be a less confusing term that “remote tracking”. Lovely. Tracking branches *track* in the sense that a stuffed Basset Hound tracks. Namely, not. It‘s a dream of what once was. The hyphenated *remote-tracking* is a lot better terminology already (and sometimes even used in the documentation), because at least it doesn't pretend to be a remote branch (`git branch -r`, of course, still does). So that single hyphen already does some good, and should be edited for consistency. (It did take time for me to convince myself during the learning process that “remote tracking” and “remote-tracking” probably are the same thing, and “tracked remote” something else, abandoning and resurrecting these hypetheses several times.) And *even if* the word was meaningful and consistenly spelt, the documentation uses it to *refer* to different things. Assume that we have the branches master, origin/master, and origin’s master (understanding that they exist, and are different, is another Aha! moment largely prevented by the documentation). For 50 points, which is the remote tracking branch? Or the remote-tracking branch? The remote branch? Which branch tracks which other branch? Does master track anything? Nobody seems to know, and documentation and CLI include various inconsistent suggestions. (I know there have been long, and inconclusive threads about this on the git mailing list, and I learned a lot from seeing other people’s misconceptions mirror my own.) Granted, I think the term “tracked remote branch” is used with laudable consistentcy to refer to a branch on the remote. And “remote tracking branch” (with our without the hyphen) more often than not refers to origin/master. It may be that terminology is slowly converging. (To something confusing, but still...) But to appreciate how incredibly difficult this was to understand, check this, from the Git Community book: A 'tracking branch' in Git is a local branch that is connected to a remote branch. To a new user, who *almost* gets it, this is just a slap in the face. Which one of these is origin/master again? None? (Or rather, it is the confirmation one needs that nobody in the Git community cares much, so the once-believed-to-be-carefully-worded documentation loses some of its authority and therefore the learner can abandon some misunderstandings.) There probably is a radical case to be made for abandoning the word “tracking” entirely. First, because tracking branches don’t track, and second because “tracking” already means something else in Git (see below). I realise that this terminology is now so ingrained in Git users that conservatism will probably perpetuate it. But it would be *very* helpful to think this through, and at least agree on who “tracks” what. In the ideal world, origin/master would be something like “the fetching branch” for the origin’s master, or the “snapshot branch” or the “fetched branch”. (I am partial to use “fetching” because it makes that operation a first-class conceptual citizen, rather than pulling, which is another siren that lures newbies into a maelstroem of confusion.) More radically, I am sure some head scratching would be able to find useful terminology for master, origin/master, and origin’s master. I’d love to see suggestions. As I said, I admire how wonderfully simple and clean this has been implemented, and the documentation, CLI, and terminology should reflect that. The staging area ---------------- The wonderful and central concept of staging area exists under at least three names in Git terminology. And that’s really, really annoying. The index, the cache, and the staging area are all the same, which is a huge revelation to a newcomer. This problem could of course be easily fixed by making up your mind. The decision which of the three terms to adopt is somewhat arbitrary, but *staging area* gives the strongest and best metaphor. It also verb quite well, even though it is not the best, shortest noun. *Index* would have been a good word for the files known to Git (what is now called, sometimes, “tracked files”), and *cache* is terrible in any case. `git stage` is already part of the distribution. Great. 1. Search for index and cache in the documentation and rephrase any and all their occurences to use “staged” (or, if it can’t be avoided “the staging area”) instead. Say “staged to be committed” often, it’s a strong metaphor. 2. Introduce the alias `git unstage` for `git reset HEAD` in the standard distribution. 3. Duplicate various occurences of `cached` flags as `staged` (and change the documentation and man pages accordingly), so as to have, e.g., `git diff --staged`. git status ---------- One of the earliest-to-use commands is `git status`, whose message are *wordy*, but were initially completely unhelpful to me. In particular, working directory clean Clean? What’s this now? Clean and dirty are Git slang, and not what I want to meet as a new user. The message should inform me that the untracked files in the working directory are equal to their previous commit. But there are other things wrong with the message. For example, even though there’s nothing to commit: `nothing added to commit but untracked files present (use "git add" to track)`? The last paranethesis should set off warning bells already. And what did clean mean with respect to untracked files? And “added to commmit”? That sounds like amending. We add to the index or the staging area, don’t we, “ready to be included in the next commit,” so they aren’t added to that commit quite yet? changed but not updated: I’m still not sure what “update” was ever supposed to mean in this sentence. I just edited the file, so it’s updated, for crying out loud! The message might just say “Changed files, but not staged to be committed.” The meant-to-be helpful “use [...] to update what will be committed” is another can of worms, and I can find at least two ways to completely misunderstand this. Change to “use `git stage <file>` to stage”. (With the new command name it’s almost superfluous.) Here are some concrete suggestions: 1. nothing added to commit but untracked files present should be nothing staged to commit, but untracked files present (Comment: maybe “... but working directory contains untracked files.” I realise that “directory” is not quite comprehensive here, because files can reside in subdirectories. But I’d like to be more concrete than “be present”.) 2. Untracked files: (use "git add <file>..." to include in what will be committed) should be Untracked files: (use "git track <file>" to track) 3. Changes to be committed: (use "git reset HEAD <file>..." to unstage) should be Staged to be committed: (use "git unstage <file>" to unstage) Adding ------ The tutorial tells us that Many revision control systems provide an add command that tells the system to start tracking changes to a new file. Git's add command does something simpler and more powerful: git add is used both for new and newly modified files, and in both cases it takes a snapshot of the given files and stages that content in the index, ready for inclusion in the next commit. This is true, and once you grok how Git actually works it also makes complete sense. “Making the file known to Git” (sometimes called “tracking the file”) and “staging for the next commit” result in the exact same operations, from Git’s perspective. But this is a good example of what’s wrong with the way the documentation thinks: Git’s implementation perspective should not define how concepts are explained. In particular, *tracking* (in the sense of making a file known to git) and *staging* are conceptually different things. In fact, the two things remain conceptually different later on: un-tracking (removing the file from Git’s worldview) and un-staging are not the same thing at all, neither conceptually nor implementationally. The opposite of staging is `git reset HEAD <file>` and the opposite of tracking is -- well, I’m not sure, actually. Maybe `git update-index --force-remove <filename>`? But this only strenghtens my point: tracking and staging are different concepts, and therefore deserve different terms in the documentation and (ideally) in the CLI. The entire quoted paragraph in the tutorial can be removed: there’s simply no reason to tell the reader that git behaves differently from other version control systems (indeed, to take some perverse *pride* in that fact). Fixing this requires no change to the implementation. `git stage` is already a synonym for `git add`. It merely requires discipline in using the terminology of staging. Note that it completely valid to tell the reader, maybe immediately and in a footnote, that `git add` and `git stage` *are* indeed synonyms, because of Git’s elegant model. In fact, given the amount of documentation cruft one can find on the Internet, this would be a welcome footnote. An even more radical suggestion (which would take all of 20 seconds to implement) is to introduce `git track` as another alias for `git add`. (See above under `git status`). This would be especially useful if tracking *branches* no longer existed. There’s another issue with this, namely that “added files are immediately staged”. In fact, I do understand why Git does that, but conceptually it’s pure evil: one of the conceptual conrnerstones of Git -- that files can be tracked and changed yet not staged, i.e., the staging areas is conceptually a first-class citizen -- is violated every time a new file is “born”. Newborn files are *special* until their first commit, and that’s a shame, because the first thing the new file (and, vicariously, the new user) experiences is an aberration. I admit that I have not thought this through.-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html