On Wed, 9 May 2007, Joshua Ball wrote: > > What the heck do these terms mean? > > HEAD > HEAD REF These are the same thing. HEAD is basically a special local branch, which usually (but not always) points to one of the local branches. I say "usually", because you *can* make it an independent branch in its own right, in which case you are using what is now called a "detached" HEAD. But even when HEAD is "detached", and it's thus really an independent branch in its own right, it's still special: it's the branch that your current working tree is associated with. So if you think of "HEAD" as just "current branch", you'll be in good shape. > working tree This is just your files - both untracked (ie you may be building stuff in the working tree) and tracked (ie the ones that git knows about). It's *not* necessarily going to match the state that HEAD describes: HEAD describes the last *committed* state, while your working tree obviously can have changes to the tracked files (along with files that aren't tracked at all), but the working tree state is certainly _associated_ with HEAD, in that HEAD would point to the most recent commit that the working tree is all about. > object This is just an internal git term. It's how git stores all revision history information - as a set of objects in a content-addressable filesystem. As a pure user, you generally never need to worry about this term, although you might notice in the case you have corruption, and run "git fsck" and it starts talking about corrupt or missing objects. > branch A "branch" is just any "tip of development". It's *literally* defined by its name (which git doesn't care about, but you do), and the name of the top-most commit (the SHA1) of that developmet series. That SHA1 is all that git really cares about - the name is purely for your enjoyment and to clarify what the branch is about. Git can track an arbitrary number of branches, but your working tree would be associated with just one of them, and HEAD points to that branch. The default branch is called "master", but that doesn't really have any special meaning per se, and as mentioned, HEAD might even be a detached branch and not associated with any "real" branch at all! > merge That's the act of bringing in the contents of another branch (possibly from an external repository) into your current branch. If you merge from something external, you need to "fetch" that other branch first, and the combination of "fetch+merge" is called a "pull". It sounds like you may have never worked with branches before, in which case you can just ignore *all* of this. Git will set up one branch for you at "git init" time (the "master" branch) and you don't actually ever have to use any more than that one branch, in which case you can literally ignore everything about branches and merging. > master See above: it's just the default name of the initial branch. It has no other meaning - git itself doesn't care about branch names at all, and it's literally nothing more than a "default branch name". > commit (as in the phrase "bring the working tree to a given commit") Any development series is just a series of "commits". They point to the "parent" commit(s) and can thus form a series (or more generally a DAG: directed acyclic graph). So a "branch" is really just a named pointer to a commit, and that commit will in turn point to its parent commit, which will point to its parent etc. Which is why I started by explaining a branch as a "tip of development", because you'd see a branch as the top-most commit that it points to, and you'd normally *change* the branch by committing to it, which will create a new commit (and move the branch to point to it), and make that new commit point to the old commit as its parent. One of the best ways to visualize this is probably to just do - clone git itself if you haven't already git clone git://git.kernel.org/pub/scm/git/git.git git - use "gitk" to see the commit history and see the branches as pointers into that commit history. With "--all -d", it will show you all branches, and the "-d" shows the commit history in date order, so while it's a bit messier than the default cleaned-up format that tries to show branches on their own, it's perhaps also a bit more instructive: gitk --all -d In particular, you should see a commit that has both a green-boxed "master" pointer pointing to it, and a "remotes/origin/master" (colored in a mixed brown/green box). Those are examples of branches: the "master" branch is your local (and normally current) brach, while the "remotes/origin/master" thing is a so-called "remote branch", which means that you cannot check it out, but you can see it and you can update it by fetching new versions from the remote. > Is there a difference between HEAD and the working tree? Yes, see above. > Does HEAD change when I cg-switch/git-checkout? Yes. But it switches by making it point to a different branch, while something like "git reset" will *also* potentially change HEAD, but do so by still staying on the same branch, but making that branch "reset" (aka jump) to another point in history. So you can literally change HEAD two fundamentally different ways: - by switching branches (which includes making HEAD be a detached branch of its own) - by changing the state of the current branch (the most common form of this is just "git commit" - it will update HEAD by creating a new commit, but as mentioned, "git reset" can also do this by jumping around in history, and that's how you'd undo work entirely, for example). > What is an object? Is it a set of patches? A tree snapshot? An object is the lowest-level of git information. It's an indivisible and unchanging "thing", that can potentially point to other objects. You can kind of think of it as an "inode" in a UNIX filesystem, and like an inode, it can point to file data or be a directory (but unlike an inode, it's immutable by design, and it can also be a "commit" or a "tag" object). So internally, git does have "tree snapshots" (not patches - git is *purely* based on snapshotting states of the project), but they are not a single object, they are built up from "tree objects" that point to other tree objects or to "blob objects". And a commit is literally a "commit object" that points to the snapshot (the "tree object") that it's associated with, and the previous commits (the "parents") that build up the history. > What the heck is a branch? (Why does it have so many different > definitions? I feel like every time I come across "branch" in the man > pages, it means something different.) Ok, hope I clarified that. > More on branches: The wiki says that a group of commits linked > together form a DAG. Does that mean every fork/clone/branch-create > possibly doubles the number of branches. So if I fork and then > remerge, do I have two branches? Yes and no. When you do a clone, you do get your totally own set of branches, but a branch is just a *pointer*. So it does _not_ duplicate history in any way, you do *not* get: > A -> B -> D > A -> C -> D But instead you get A -> B -> C -> D as commits, and you now have a new pointer to D. So creating a branch *literally* just creates a new pointer. In fact, you can still create a new branch manually by doing echo "sha1-of-branch-goes-here" > .git/refs/heads/my-new-branch and that is how the git scripts literally used to do it (well, slightly simplified: verifying that the SHA1 is valid, and that the branch didn't already exist). So the branch really *is* just a named commit. > Would D be the head of this branch? If so, then heads do not uniquely > identify a branch? A branch uniquely identify a particular commit, but many branches can point to the same commit (and the branches are considered "identical" when they do that - you can have two different branches, but if they point to the same thing they are identical in all respectcs except for naming). > Is there a standard revision notation? (Where my definition of > "revision" is a tree snapshot. In SVN, it would be identified by a > number.) `cg-diff -r A..B` works fine if A and B are branches, but how > do I diff from an older revision to a newer revision? Can I diff > between two revisions which haven't shared the same parent since 2006? The "standard" revision notation is the SHA1 of the commit, but quite frankly, you'd never use it. If you have two branches named A and B, you'd generate the diff with git diff A..B and it doesn't matter if they share a parent since yesterday, since five years ago or whether they are related AT ALL. Git will happily diff totally unrelated branches (if you imported two tar-balls independently, they may not have any common history at all, but you may still want to diff them if they are from the same project!) > What about the master branch? Is there anything special about it? By > special I mean, do any of the git or cogito commands implicitly assume > that you are working with master? If git is truly decentralized, then > wouldn't master be on an equal footing with all other branches? Correct. The only thing that is special about master is that it's the one that is created by "git init" (or "git clone", for that matter). > What is a merge? My understanding of merge comes from the SVN book, Forget SVN merges. SVN cannot do merges (SVN also cannot really do branches - what SVN calls branches is some abhorrent and stupid copy of a working tree with copying of the limited notion of history that SVN knows about). > where it was described as diff+apply. Diff takes 2 arguments, and > apply takes a 1 argument (if the patch is implicit). However, cg-merge > only appears to take one branch. (There again a use of the word > branch! Wouldn't commit or revision be a more accurate term?) (You're likely better off using just "raw git" rather than cg these days, so I'll talk about "git merge"). A "git merge" actually does have two branches: the current one, aka HEAD, and the one you want to merge _into_ the current one. So when you do git merge other-branch it will merge 'other-branch' into the current branch (HEAD). And no, it's not a "diff+apply" (although early and *very* broken versions of cg implemented the data part that way), it's a much more interesting operation that figures out the last common point from the history, and does a series of three-way merges (especially if there were *multiple* independent common history points), and then records the set of parents in the result. That, btw, is why SVN cannot do merges. It really *does* do a fancy "diff+apply" that probably involves three-way operations too, but since it doesn't actually remember the resulting history, it cannot be considered a "merge". It didn't really merge the history - it just smushed the *contents* of two branches together, and then totally threw out all the really important bits. > Lastly, the most important question of all, which may answer many of > the questions above: > > Can you fill in the missing pieces, making corrections where > necessary? (recommend unispace font) > > Command | Reads | Writes > cg-fetch | remote branch | corresponding branch in local respository > cg-commit | working copy | HEAD > cg-update | remote branch | working copy AND HEAD > cg-merge | branch & working copy | working copy > cg-diff | arguments | STDOUT > cg-push | | remote branch (usually origin) > cg-pull | remote branch | > cg-restore | | I'll use the git names (which are generally the same) Command | reads | writes --------------+-----------------------+----------- git fetch | remote branch(es) | local branch(es) git commit | local data | HEAD git pull | remote branch(es) | HEAD git merge | local branch(es) | HEAD git diff | local data | git push | local branch(es) | remote branch(es) git reset | --- | HEAD and everything that writes HEAD implicitly will always also update the working tree too (with the obvious exception of "git commit" - since it's filling in the HEAD with the current state, it's obviously not going to update the working tree). The "local data" is really a combination of "local branches, staging area and working tree": neither "git diff" and "git commit" really work purely on the working tree, they both will mix using the staging area, the working tree, and pure branch information depending on exact flags. And note that most of the operations really can work on multiple branches (that's not true in cg). IOW, you can actually merge multiple branches in one go (the end result is called an "octopus merge", because it looks cool and has many "legs" when you see the merge history in a bottom-to-top kind of thing like gitk). > On cg-fetch, is the remote branch necessarily remote? Or can you fetch > from local You can always consider the local tree to be a remote one: just use ".". So git merge other-branch is basically the same as git pull . other-branch > cg-switch-branches? What does "corresponding branch in local > repository" mean? Does cg-fetch touch your working copy? Confusing cogito terminology. The pure git stuff is actually clearer. And in git, you can specify what the "corresponding" branch is for any local branch. For example, if you just do the "git clone" of the git repository, then assuming you have a recent enough git, you can look into the ".git/config" file of the result, and you should see something like this: [remote "origin"] url = master.kernel.org:/pub/scm/git/git.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master which describes a remote repository ("origin") and tells you what branches should be fetched when you do a "git fetch origin", but it *also* describes the local branch "master", and says that when you do a "git pull", it should merge the *remote* branch "refs/heads/master" from "origin". > What is the difference between cg-restore and cg-seek? Don't use them. Cogito confusion. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html