[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I find that to really get people to understand this, it helps
to start them out by having them forget about the word "branch"
entirely.  They should instead think of commits -- which in
Git, are found by their big ugly hash IDs -- as the unit of
storage in Git.  (This is *mostly* accurate, though we can get
picky about other units of storage, but let's not do that here.)
So given a hash ID like 0f2ec7a4eeeec3045d7680e98f958740cd29bd77
-- which is too big and ugly for people to deal with, but we
can shorten it to say 0f2ec7a4eee -- we've found one specific
commit, assuming it exists.  That commit contains a complete
snapshot of every file that Git "knew about" at the time you
(or whoever) made that commit.  That's most of what most people
care about.

But that commit also contains some metadata: information such
as the name and email address of the person who made the commit,
and some date-and-time stamps, and so forth.  One of these
metadata items is a list of raw hash IDs of *previous* commits,
usually exactly one entry long.  We call that hash ID the
parent, or parents if it's longer than one entry, of the
commit.

The way Git stores "history" is precisely through these parent
IDs.  That is, given one commit ID like 0f2ec7a4eee, Git can
find *that* commit.  But that commit tells which commit comes
before it: 114193fd391, for instance.  Git can then find *that*
commit, which has another parent, which Git can find, which
has yet another parent, and so on.  By working backwards, one
commit at a time, Git finds the history of the repository.

And *now* it's time to consider the word "branch".  This word
has two meanings: it means both 0f2ec7a4eee, the *last* commit
on the branch, and also 0f2ec7a4eee, plus 114193fd391, plus
whatever comes before that, plus everything all the way back
in time to the very first commit ever.  So if this is "br-1",
then "br-1" means both "0f2ec7a4eee" and "everything leading up
to 0f2ec7a4eee".  Which meaning gets used depends on context.

But there's another funny thing here.  When you're on "br-1"
and you make a *new* commit, two things happen:

 1. The new commit gets 0f2ec7a4eee as its list-of-previous
    commit hash IDs.  That means whatever hash ID the new
    commit has -- let's say a5678xxx... -- Git will be able to
    work backwards to 0f2ec7a4eee and 114193fd391 and so on.

2. The branch name, "br-1", is rewritten to mean a5678xxx.

No *existing* commit changes at all.  In fact, it's impossible
to change a commit.  When you use "git commit --amend", you're
participating in a lie (a useful little lie to be sure, but a
lie): you don't change the existing commit, you just make a new
and improved commit, whose parent is the same as the parent of
the current commit.  The current commit then gets shoved out
of the way so that the new commit links to the current commit's
parent, instead of the current commit itself.  Graphically,
Git might replace this:

     ... <- 114193fd391 <- 0f2ec7a4eee   <-- br-1

with this:

                        0f2ec7a4eee  [lost / abandoned]
                       /
     ... <- 114193fd391 <- b0123456789   <-- br-1

If there's any *other* way to find 0f2ec7a4eee, however, well,
it's still there, still holding (forever) all the files that
it holds.

Anyway, once you grasp this, it becomes possible to understand
what happens with files when you make commits.  But now we have
to dive into another aspect of Git.

 ** your working tree **
The files stored under a commit's hash ID are permanent and
unchanging.  This is just what we want for revision control: we
*want* to get the old files back, even if there are mistakes in
them.  But it's not what we want to do *new work*: we need to be
able to rewrite files to correct mistakes and/or add new stuff.

To enable this, Git will "check out" a commit by copying, from
the permanent store, the contents of all the committed files.
These copies to into your "working tree" or "work-tree", and
here they take the form of ordinary files, which you can modify
to your heart's content.

In other version control systems, that's the end of the story
because you make new commits from your working tree.  Git again
differs here, as it has a thing it calls the "index" or "staging
area", but we won't get into these details here other than to
mention that "git add" is mostly necessary.  You "git add" any
updates and then run "git commit" and Git makes a new commit:

     ... <- 114193fd391 <- 0f2ec7a4eee <- a1234567890 <-- br-1

If you added a totally-new file to commit a1234567890, well,
it's there in that commit.  If you check out that commit,
that file comes out into your working tree.  If you make
a new branch name *now*, well, let's draw that:

     ... <- 114193fd391 <- 0f2ec7a4eee <- a1234567890 <-- br-1
                                                     \
                                                      `- br-2

Your new branch name *also identifies the new commit* by its
hash ID, so it will contain the new file.  But suppose you make
the new branch name *before* this point?  That is, you have:

     ... <- 114193fd391 <- 0f2ec7a4eee <-- br-1
                                      \
                                       `- br-2

If you now make a new commit while "on" branch br-1, you get:


     ... <- 114193fd391 <- 0f2ec7a4eee <- a1234567890 <-- br-1
                                      \
                                       `- br-2

The name br-2 still identifies commit 0f2ec7a4eee, which does
not have the new file in it.

To make things still-more-confusing, if you create new files in
your working tree, but *do not* commit them, Git doesn't "know
about" the files and does not store them in the commits.  (This
is where "git add" comes in again: if you didn't use it, Git
treats this as an "untracked file".)  Any such file just hangs
around in your working tree: Git neither modifies nor removes
the untracked file.[2]

 ** that's why this is a bit messy **

Given the dual meaning of the word "branch" and the fact that
we don't know whether you meant one specific commit hash ID,
or some other specific commit hash ID, or a chain of commits
ending in a specific hash ID, we can't really say what "should"
happen.  But you can find out by use of this principle.  Use

    git rev-parse main

to find out which commit hash ID "main" means right now, and use
similar "git rev-parse" commands to find out which specific commit
hash IDs other names mean right now.  Or: use "git log --graph
--decorate --oneline --branches" to help you visualize the
chain(s) of commit(s) reachable by starting at any particular
branch label and working backwards.

  ** footnotes **

[1]: https://www.reddit.com/r/lotr/comments/1608zdc/question_on_advices_from_elves/

[2] There's an exception to this rule.  Suppose some historical
commit has a file with the same name as some existing untracked
file, and you ask Git to check out the historical commit.  Git
must replace the untracked file with the historically-tracked file
in that old commit.  If you then switch *back* to the newest
commit, in which the file doesn't exist, Git has to remove the
historical-commit-copy that it copied out to the working tree,
which in turn destroyed the unsaved work that was hanging around
as an untracked file.  Git has a number of precautions against
this kind of clobbering unsaved work, but there are some corner
cases here that are problematic.  If you need to work with a
historical commit *and* a more recent commit that might have files
with colliding names, consider using "git worktree" to make a
place to examine the historical commits.

Chris





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux