Re: [PATCH 2/2] worktree.c: use submodule interface to access refs from another worktree

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 9 Feb 2017 09:04:58 +0100

On 02/09/2017 07:55 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> There are two meanings of the concept of a "ref store", and I think this
>> change muddles them:
>>
>> 1. The references that happen to be *physically* stored in a particular
>>    location, for example the `refs/bisect/*` references in a worktree.
>>
>> 2. The references that *logically* should be considered part of a
>>    particular repository. This might require stitching together
>>    references from multiple sources, for example `HEAD` and
>>    `refs/bisect` from a worktree's own directory with other
>>    references from the main repository.
>>
>> Either of these concepts can be implemented via the `ref_store` abstraction.
>> ...
>> The `ref_store` that you want here for a worktree is not the worktree's
>> *logical* `ref_store`. You want the worktree's *physical* `ref_store`.
>> Mixing logical and physical reference stores together is a bad idea
>> (even if we were willing to ignore the fact that worktrees are not
>> submodules in the accepted sense of the word).
> 
> I am not quite sure what mental model you are suggesting as a
> preferred solution.  We can
> 
>  - represent a set of refs stored for a particular worktree
>    (i.e. HEAD, refs/bisect, and refs/worktree/<name>, iirc), as
>    bunch of ref_stores;
> 
>  - represent a set of refs shared across a set of worktrees that
>    share the primary one, as another ref_store;
> 
>  - a caller who wants to get a "logical" view of a single worktree
>    user can pick one of the first kind and union that with the
>    second one, and represent the result as a (synthetic) ref_store.
> 
> The third one is "stitching together from multiple sources".  By
> "mixing logical and physical is a bad idea", do you mean that the
> same abstraction "ref_store" should not be used for the first two
> (which are physical) and the third one (which is logical)?  Do you
> want to call the first two "physical_ref_store"and the last one
> "ref_store" and keep them distinct?

The existing `ref_store` abstraction, I think, is capable of
representing either kind of reference store. The stitching together to
get the "logical" view of a worktree should probably happen within the
refs code rather than forcing callers to deal with it. But yes, I think
that code should put together a compound `ref_store` object that
delegates to multiple underlying `ref_store` objects as you've described.

Which kind of `ref_store *` you have in your hand would depend on where
you got it. If you call the hypothetical `get_submodule_refs()`
function, you would get a `ref_store *` representing the references that
are logically visible from that submodule. There might be a separate
`get_worktree_specific_refs()` that returns a `ref_store *` representing
the worktree-specific references physically stored for the worktree. But
maybe the latter is not even necessary; see below.

> For the purpose of anchoring objects in the object store shared by
> multiple worktrees, we can either iterate over all the ref_stores
> of the first two kind, or iterate over all the ref_stores of the
> third kind for all worktrees.  The latter of course is less
> efficient as the enumeration
> 
> 	for worktree in all worktrees:
> 		for ref in get_ref_store(worktree)
> 			mark tip of ref reachable
> 
> will work on all the shared refs multiple times, but as an
> abstraction that may be simpler.  The alternative of working at the
> physical level would be more efficient
> 
> 	for worktree in all worktrees:
> 		for ref in get_ref_store_specific_to_worktree(worktree):
> 	        	mark tip of ref reachable
> 	for ref in get_ref_store_shared_across_worktrees():
>         	mark tip of ref reachable
> 
> but this consumer now _knows_ how the logical ref_store of a
> worktree is constructed (i.e. by combining the two ref_stores),
> which appears as a layering violation.
> 
> I am however not sure if these issues are what you are driving at,
> and what exact design you are suggesting.

Reachability is a special case, because it needs all of the references
that refer to a particular object store, even though the reference names
might overlap. I personally think that reachability roots should be
requested via a new refs API call separate from `for_each_rawref()` (or
whatever is used now). Internally it would be implemented much like your
second "efficient" algorithm, but the implementation would be within the
refs code, and the caller could remain ignorant of those details.

Externally, it might not even want to pass the caller the real reference
names (I assume that callers mainly only use the reference names for
diagnostic messages). For example, it might want to report references
`HEAD` and `refs/bisect/bad` in worktree `foo` under the pseudonyms
`worktree/foo/HEAD` and `worktree/foo/refs/bisect/bad`, so that they can
be distinguished from any homonyms in the main repo and in other
worktrees. If you ask for the reachability roots while in a worktree, it
would either automatically crawl up to the main repository and across to
sibling worktrees to get the full set of reachability roots, or maybe it
would refuse to run at all (if we want to require such commands to be
executed from the main repo).

I don't know exactly who would be the consumers of the reachability
roots, so maybe there are some problems with this suggestion.

Michael