Re: [PATCH] Proof-of-concept patch to remember what the detached HEAD was

Björn Steinbrink <B.Steinbrink@xxxxxx> · Fri, 16 Oct 2009 10:27:55 +0200

On 2009.10.16 02:02:09 -0400, Daniel Barkalow wrote:
> On Fri, 16 Oct 2009, Björn Steinbrink wrote:
> 
> > On 2009.10.15 14:54:18 -0700, Junio C Hamano wrote:
> > > If it is very important to support:
> > > 
> > >     $ git checkout --look-but-not-touch origin/next^
> > > 
> > > then James's approach would not be very useful, as we do have to detach
> > > HEAD and implement the "do not touch" logic for detached HEAD state
> > > anyway, so we might just use the same logic we would use for origin/next^
> > > when checking out origin/next itself.
> > 
> > I don't have any numbers backing this up, but my gut feeling says that
> > most cases of "Where have my commits gone?" that I have seen on #git
> > were due to "git checkout HEAD~2"-like actions. Either because the user
> > assumed SVN-like behaviour (you can't commit until you do "svn up", like
> > "git reset --merge HEAD@{1}") or thought that "git checkout
> > <committish>" would act like "git reset --hard <committish>".
> > 
> > For the latter I fail to envision any solution except for
> > education (and I have no idea why the user expected checkout to work
> > like reset).
> > 
> > The former can be solved by the proposed extra information in HEAD,
> > forbidding changes to HEAD that make it reference a commit that's not
> > reachable through the head stored in the extra information[*1*] and providing
> > some command that acts like "svn up".
> > 
> > This seems quite different from the plain "forbid committing" or "detach
> > and know how you get there", but more like "detach and know where you're
> > coming from".
> 
> What's the state before the "git checkout HEAD~2"?
> 
> If it's:
> 
> $ git checkout origin/some-obscure-branch
> (get curious about the commit a bit back)
> $ git checkout HEAD~2

IIRC, most of the time it was:
git checkout master # not detaching
git checkout HEAD~2

Another version I recall (but that's what I use myself regularly, so I
might be biased and think that it's more common that it actually was)
is:

git checkout master
git log # Find commit, copy hash
git checkout <hash> # Pasting the copied hash

> And then the user doesn't know how to get back to where they were, then it 
> should work if git had stored "origin/some-obscure-branch~2" at this point 
> (having substituted "origin/some-obscure-branch" (the previous extra info) 
> for HEAD). Then we could have a "git up" that would discard modifiers from 
> the extra info and check that out. Or users might find "git checkout 
> origin/some-obscure-branch" obvious enough if git is reporting something 
> related.

I'd not put "origin/some-obscure-branch~2" in there, but just
"origin/some-obscure-branch". Rationale: The ~2 modifier may become
invalid when you do "git fetch". And I don't see any value in having
that modifier, and even if there are some corner-cases, those could use
"git describe" or so, to get the modifiers on the fly.

> I know I often find my git.git repos on "* (no branch)", and I don't 
> remember if I checked that out as origin/master or origin/next. And that's 
> an important clue as to when I'd been doing there previously, and what I 
> might want to do next. Perhaps these users are having a similar problem, 
> where they're relying on git to remember what they were doing?

Hm, maybe. I'm more inclined to think that they assumed that "git
checkout <branch_head>" 'binds' them to that branch head. But git allows
them to jump around freely, the 'binding' is very weak.

SVN has:
"svn up": Get an older/newer version of the branch I'm on
"svn switch": Switch to a different branch

You cannot jump around without binding yourself to any branch.

git has:
"git checkout": Go anywhere, bind to that till the next checkout.

With "git checkout <non-branch-head>" working like a temporary unnamed
branch head.

In my view, there's the huge conceptual difference that svn has named
branches, while git only has named branch heads, that have a history
(reflog) that isn't necessarily even remotely similar to that of the
branch it currently points at.

          G---H---I (bar)
         /       /
A---B---C---D---E (master)
     \
      F (foo)

In SVN, you'd have a history that describes how "bar" came into its
current state, consisting of: G, H, I (Not following the copy at
G).

In git, you have a history that describes how commit I came into
existence (not branch head "bar"!), that is: A, B, C, D, E, G, H, I.

And the actual history for "bar" in git (the reflog) might be as weird
as: E, F, H, I, B, I. Jumping wildly across the commit DAG.

My view is that with git you're never "on a branch", but you have an
active branch head (possibly unnamed [detached HEAD]) that marks a tip,
where the DAG grows. A branch, to me, has an extent. In the above graph,
G, H, I is a branch, and F is a branch, but "bar" is not. "bar" has no
extent, it's just where the "G, H, I" branch might possibly grow.

When you do "git checkout bar~2", you're not on "no branch". Your active
branch head is just unnamed. The branch is yet to be born (unless you
consider e.g. "A, B, C, G" to be your branch), but at least after doing
a "git commit", you'll have:

            J (HEAD)
           /
          G---H---I (bar)
         /       /
A---B---C---D---E (master)
     \
      F (foo)

And then you clearly do have a branch "J" there. At least if you stop saying
that a branch is just its head. And "git branch" saying that you're on
"no branch" makes no sense at all then.

Git simply doesn't expose branches in a sense of "a series of commit".
To get that, you need things like "git log master..bar" to get the "G, H,
I" branch.

SVN has this clear "this branch has this name" concept, git does not. I
prefer gits way. But maybe it should simply not use the term "branch"
when it means "branch head".

And the glossary even somewhat agrees with me, although it disagrees
with itself:

  branch
    A "branch" is an active line of development. The most recent commit
    on a branch is referred to as the tip of that branch. The tip of
    the branch is referenced by a branch head, which moves forward as
    additional development is done on the branch. A single git
    repository can track an arbitrary number of branches, but your
    working tree is associated with just one of them (the "current" or
    "checked out" branch), and HEAD points to that branch.

So a branch is an active line of development. If I do "git checkout
foo~2" and "git commit", I clearly do have an active line of
development. So it's not "no branch".

And a "branch head" references the tip (point of growth) of a branch.
It's not identical to a branch.

But then there comes HEAD, which is said to point to a branch, while it
of course points to a branch head. I guess the glossary is outdated WRT
detached HEAD, but if you ignore the implementation details, it could
still be said that in case of a detached HEAD, HEAD points to an unnamed
branch head.

The user manual also makes this distinction, but says that "when no
confusion will result, we often just use the term 'branch' both for
branches and for branch heads".

And the command man pages follow that "just use 'branch'" way:

"git checkout" is said to checkout (and possibly create) a branch, not a
branch head.

"git branch":
 - List, create or delete branches
 - --contains/--merged/--no-merged => show branches that...

But the clear winner is (from git-branch(1)):
"If the <commit> argument is missing it defaults to HEAD (i.e. the tip
of the current branch)."

Combine that with a detached HEAD and the according "git branch" output.
The <commit> argument will default to the tip of no branch! Epic.

Don't get me wrong though. I like git's model, and think that its
anonymous branches with named branch heads offer a lot more than e.g.
SVN's named branches (and namespace pollution is just a minor factor).
But I'm more and more convinced that suggesting the unknowing user who
didn't read the "we term A for A and B" notice in the manual, that git
does have named branches (branches being a series of commits) is a bad
thing that leads to confusion (in contrast to what the manual assumes).

Examples:

User asking about deleting a "branch" without deleting its commits:
http://colabti.de/irclogger/irclogger_log/git?date=2009-10-14#l2113

User asking whether deleting master will mess up other "branches":
http://colabti.de/irclogger/irclogger_log/git?date=2009-10-13#l2407

(I thought that there were two more in the last week, but I couldn't
find them, so maybe I was wrong, or they used some other word than
"delete", which I used to search the logs).

In both cases, the user wanted to delete the branch head, but was afraid
that that would kill the commits, as they were told that they will
delete the "branch" and assumed the "branch" to be all commits reachable
through the branch head.

And I kind of doubt that this just applies to SVN refugees that are used
to SVN's meaning of branches, but also to people that are new to any
kind of VCS and "naively" apply the branch-analogy to real-world trees,
where branches are more than just their tips.

And I believe that this is closely related to the detached HEAD thing.
See above for the "no branch" stuff that now doesn't even make any sense
to me anymore (even less after reading the git-branch(1) man page ;-)).
But also the fact that checkout is hardly about branches, but about
(possibly unnamed) branch heads. One might have certain branch heads
that are never rewound and thus might be more or less equal to a branch,
but that seems like it's almost(?) a special case if you consider what
"checkout" can actually do.

I'm not sure how this can be alleviated. Just saying "branch head"
instead of "branch" is more correct, but probably still doesn't really
express those differences that make git what it is. Making "git branch"
say "* (unnamed branch head)" instead of "* (no branch)" seems like a
good start, but the user manual would need a very close look to catch
all the text that stops to make sense when you suddenly start to make a
stronger difference between a branch and a branch head. I'll look into
that over the weekend, but won't promise anything.

Björn, hoping that he didn't run too far off the track
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html