Re: Orphan branch not well-defined?

Chris Torek <chris.torek@xxxxxxxxx> · Tue, 21 Nov 2023 17:42:42 -0800

On Tue, Nov 21, 2023 at 4:36 PM Craig H Maynard <chmaynard@xxxxxx> wrote:
> [git checkout and git switch treat --orphan differently]
>
> Leaving aside the question of whether or not this is a bug,

Just to answer the implied question: this is intentional.

> there doesn't appear to be any formal definition of the term "orphan branch"
> in the git documentation. Am I missing something?

Whether it's documented anywhere or not, it's not done well. This is
not surprising: It is hard to do it well!  Git uses two phrases for
this: "orphan branch" and "unborn branch". To understand them
properly, let's start at the real beginning.  Bear with me for a
moment here.

In Git, the identity of a commit -- the way that Git locates the
commit internally -- is its hash ID.  (Aside: until the SHA-256
conversion, therewas only ever one hash ID for any commit ever made
anywhere.  Now that Git supports both SHA-1 and SHA-256, there are two
possible IDs, depending on which scheme you're using.)  It's possible,
at least in theory, to use Git without ever creating a branch name:
all you have to do is memorize these random-looking hash IDs.  But
that's not how people's brains work, and it's quite impractical.  So
Git offers us branch names, like "main" or "master", "dev" or
"develop", and so on.

In Git, a branch name is just a human-readable name for one of Git's
internal hash IDs, with a special and very useful property that
distinguishes it from a tag name.  Each tag name is a human-readable
name for a hash ID too; they just lack the special property of the
branch names.  We won't get into all the properties here though, and
for the moment, we just need to know that the name stands in as a
memorable version of the ID.

As a result, a Git branch name literally cannot exist unless it
identifies one specific commit!  We call that one specific commit the
"tip commit" of that branch (which introduces a whole new confusion,
of whether a "branch" is *one commit* or *many commits*, but again we
won't get into this here).

This leaves us with a big "chicken or egg" problem
(https://en.wikipedia.org/wiki/Chicken_or_the_egg).  Suppose we've
just created a new, empty repository, which by definition has no
commits in it: it's *new*, and *empty*.  How many branch names can we
have in this new, empty repository?  We've just claimed that a branch
name must identify some specific commit, and we have no commits, so
the answer is: none.  We cannot have any branch names at all.

But -- here's the other paradox -- whenever we make a *new* commit,
it's to be *added on to the current branch*.  But we have an empty
repository, which cannot have any branch names, so how do we know what
the "current branch" even *is*?

** Unborn Branch is the better term **

Now that we understand the basic problem -- that a new repository
can't have any branches, but that we want Git to *create* a branch
when we make that very first commit -- we can see what an "orphan" or
"unborn" branch is all about.  It papers over our chicken-or-egg
problem.  We simply save the *name we want Git to create* somewhere,
then we make a new commit as usual.  When, eventually, we do make that
commit, Git says: "OK, I should add this new commit to the current
branch br1", or whatever the name is.  Git then creates the new commit
*and* creates the branch name, all as one big operation.  Now the
branch exists: it's born.

When we have a normal (not-unborn) branch and create a new commit, Git
creates the new commit as usual and then -- here's the unique property
of *branch names* that makes them so special -- *updates* the branch
name to hold the new commit's new hash ID.  Git also makes sure that
the new commit we just made links back to the commit that *was* the
tip commit of the branch, just a moment ago.  So this is how branches
"grow" as you make commits.  The *name* holds only the *last* commit
hash ID.  Each commit holds the *previous* hash ID, so that Git can
start at the end of a branch and work backwards.  The previous, or
parent, commit, has its own parent, which has another parent, all the
way back to the beginning of time.

This is also where the dual meaning of "branch" clears up somewhat: a
branch is both the tip commit *and* the whole-chain-of-commits,
starting at the tip and working backwards.  How do we know which
meaning someone means?  Sometimes it's clear from context.  Sometimes
it's not clear.  Sometimes whoever used the word isn't even aware of
the issue!

** The `--orphan` options **

That weird problematic state for a *new* repository, where no branches
can exist, yet you want to be "on" the branch you're going to create,
only exists as a problem for a new and empty repository.  But given
that Git has to solve that problem, Git can let you enter that weird
state any time.  That's what `--orphan` was originally invented for:
to go back into that state even if you have some commits.

That is, `git checkout --orphan` meant: make the current branch name
be an unborn branch, the way it is in a new and totally-empty
repository.  Then when I make my next commit, that will create a new
commit that has no parent commit.  Whether (and when and how) this is
actually useful is another question entirely, as is the reason for
switch and checkout behaving differently in terms of how they treat
the index and working tree.  But this is the heart of the option: it
means "go into the unborn branch state".

(Side note: there are other ways to solve the "new repository"
problem, and there are other ways to define "branch".  Other version
control systems sometimes use other ways.  Git's rather peculiar
definition of branch was rare, perhaps even unique, in the early days
of Git.)

Chris