Re: [PATCH] RFC: A new type of symbolic refs

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 17 Jul 2017 14:48:17 -0700

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> +int read_external_symref(struct strbuf *from, struct strbuf *out)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	const char *repo, *gitlink;
> +	int hint, code;
> +	struct strbuf **split = strbuf_split(from, 0);
> +	struct strbuf cmd_out = STRBUF_INIT;
> +
> +	if (!split[0] || !split[1])
> +		return -1;
> +
> +	repo = split[0]->buf + 5; /* skip 'repo:' */
> +	gitlink = split[1]->buf;
> +
> +	argv_array_pushl(&cp.args,
> +			"ignored-first-arg",
> +			"-C", repo,
> +			"ls-tree", "-z", "HEAD", "--", gitlink, NULL);
> +
> +	/*
> +	 * 17 accounts for '160000 commit ',
> +	 * the \t before path and trailing \0.
> +	 */
> +	hint = 17 + GIT_SHA1_HEXSZ + split[1]->len;
> +	code = capture_command(&cp, &cmd_out, hint);
> +
> +	strbuf_release(split[0]);
> +	strbuf_release(split[1]);
> +
> +	if (!code) {
> +		strbuf_reset(out);
> +		strbuf_add(out, cmd_out.buf + strlen("160000 commit "),
> +			   GIT_SHA1_HEXSZ);
> +	} else
> +		return -1;
> +
> +	return 0;
> +}

This may help the initial checkout, but to be useful after that, we
need to define what happens when an equivalent of "git update-ref
HEAD" is done in the submodule repository, when HEAD is pointing
elsewhere.  The above only shows read-only operation, which is not
all that interesting.

I _think_ a symbolic HEAD that points upwards to the gitlink entry in
the superproject's index is the easiest to understand and it is
something we can define a clear and useful semantics for.

When a recursive checkout of a branch 'foo' is made in the
superproject, the index in the superproject would name the commit in
the submodule to be checked out.  We traditionally detech the HEAD
at the submodule to that commit, but instead we could say "check the
index of the superproject to see where the HEAD should be pointing
at" in the submodule.  Either way, immediately after such a
recursive checkout, "git status" inside the submodule would find
that the HEAD points at the commit recorded in the 'foo' branch of
the superproject and things are clean.  

After you work in the submodule and make a commit, an equivalent of
"git update-ref HEAD" is done behind the scene to update HEAD in the
submodule.  In the traditional world, that is done to detached HEAD
and nothing else changes, but if the symref HEAD points upwards into
the index of the superproject, what needs to be done is very obvious;
we do "git add submodule" in the superproject.  And this is not just
limited to creating a commit in the submodule.  "reset --hard HEAD~2"
in the submodule to rewind the HEAD by two commits would also be an
update to HEAD and through the symref-ness of the HEAD should result
in an update to the index of the superproject.

However, I do not think a good explanation of what should mean when
this new-style symbolic HEAD points at a commit in the superproject,
whether its limited to its HEAD or a tip of an arbitrary branch that
may not even be checked out.  These are not something we can easily
change without affecting wider context.  Our submodule, when we make
a new commit, may be ready to advance, but our superproject and
other submodules may not be ready to be included in a new commit in
the superproject.

So I think the idea this patch illustrates is on to something
interesting and potentially useful, but I am not sure if it makes
sense to tie it to anything but the index of the superproject.

Even if we limit ourselves to pointing at the index of the
superproject, there probably are a handful of interesting issues
that need to be clarified (not in the sense of "this and that issues
exist, so this won't be a useful feature", but in the sense of "we'd
be able to do these useful things using this feature, and we need to
fill in more details"), such as:

 - Making new commits in the submodule available to the upstream.
   Just like a detached HEAD in the submodule, this is not tied to
   any concrete branch, and it is unclear how a recursive "push"
   from the superproject should propagate the changes to the
   upstream of the submodule;

 - Switching between branches that bind the same commit for the
   submodule in the superproject would work just like switching
   between branches that record the same blob for a path, i.e. it
   will carry forward a local modification.

 - The index entry in the superproject may now have to get involved
   in fsck and reachability study in the submodule as reachability
   root.  A corollary to this is that submodules behave more
   similarly to regular blobs wrt "git reset --hard" in the
   superproject, which is a good thing.  "git -C submodule commit &&
   git reset --hard" will create a new commit in the submodule, add
   it to the index of the superproject, and then lose that change
   from the index of the superproject, making the commit
   unreachable, just like "edit file && git add file && git reset
   --hard" in the superproject will make the blob that records the
   updated content of the file unreachable.

Thanks.