Re: [RFC] t7410: 210 tests for various 'git submodule update' scenarios

"W. Trevor King" <wking@xxxxxxxxxx> · Thu, 17 Apr 2014 08:31:09 -0700

On Thu, Apr 17, 2014 at 01:42:42PM +0200, Johan Herland wrote:
> >> +# T2: Test with submodule.<name>.url != submodule's remote.origin.url. Does
> >> +#     "submodule update --remote" sync with submodule.<name>.url, or with the
> >> +#     submodule's origin? (or with the submodule's current branch's upstream)?
> >
> > All fetches should currently use the submodule's remote.origin.url.
> > submodule.<name>.url is only used for the initial clone (*.*.*.1), and
> > never referenced again.  This would change using my tightly-bound
> > submodule proposal [1], where a difference between
> > submodule.<name>.url and the submodule's @{upstream} URL would be
> > trigger a dirty-tree condition (for folks with tight-bind syncing
> > enabled) from which you couldn't update before resolving the
> > difference.
> 
> Ok. As stated above, I am worried about the amount of duplicated
> state between the superproject's submodule config (which itself is
> split between .gitmodules and .git/config) and the submodule's own
> config. And from the above paragraph, I suspect two more dimensions
> need to be added to the test matrix:
> 
>  - submodule's remote.origin.url ==/!= submodule.<name>.url
> 
>  - "tightly-bound submodule" is enabled/disabled

Tight-binding hasn't been implemented yet, or even accumulated much
support from other folks ;).  However, the idea is to unify the state
between the superproject's .gitmodules and .git/config and the
submodule's .git/config (or ../.git/modules/<name>/config, or
whatever).  Then folks with tightly-bound syncing enabled have only
one state space to maintain (and get auto-updates for each
superproject checkout), and folks who opt-out of tightly-bound syncing
are presumably embracing the complexity of our current system, with
it's two, confusingly-aligned configuration spaces.

I'm happy to force syncing (i.e. no opting-out allowed) [1], but I
imagine there are folks who would resist ;).  Maybe a deprecation
period to help ease the transition?  This is all assuming that I get
more folks to buy into the tight-syncing ;).

The end-goal of my tightly-bound approach is to remove 'submodule
update' altogether and end up with a simpler interface [2]:

On Sat, Jan 11, 2014 at 05:08:47PM -0800, W. Trevor King wrote:
> * git submodule [--quiet] add [-b <branch>] [-f|--force] [--name <name>]
>                 [--reference <repository>] [--] <repository> [<path>]
> * git submodule [--quiet] init [--] [<path>...]
> * git submodule [--quiet] deinit [-f|--force] [--] <path>...
> * git submodule [--quiet] foreach [--recursive] <command>

All of this 'submodule update' integration confusion would be resolved
by the developer who updated the gitlink, and superproject checkouts
would just swap the local submodule branch/HEAD without having to
worry about clobbering uncommitted state.

On Thu, Apr 17, 2014 at 01:42:42PM +0200, Johan Herland wrote:
> We should instead seek ways to minimize the duplication of state.

The tightly-bound-submodules I'm proposing try to use the submodule's
config settings (plus submodule.<name>.local-branch) as the familiar
language, while your proposal uses Git commands as the familiar
language.  I think both would work, but config settings are easier to
parse automatically, which helps with automatically syncing between
the superproject and submodule configs.  Syncing, in turn, helps
bridge the gap between the easily shared superproject/.gitmodules and
superproject/.git/modules/<name>/config (enabling familiar-to-use Git
commands in the submodule).

>  - submodule.<name>.create: …

Syncing submodule state back up into this is going to be a manual
operation.  For example, changing the submodule's remote.origin.url is
going to require hand-tweaking to update this setting.

>  - submodule.<name>.update: …
>     …
>     - 'git reset --hard $GITLINK'
>       Equivalent to checkout-mode (without --remote).
> 
>     - 'git fetch && git reset --hard origin/HEAD'
>       Equivalent to checkout-mode with --remote.

Folks who sometimes use --remote updates will still need non-remote
updates.  For example, if Alice and Bob are both developers on the
same superproject:

  alice$ git submodule update --recursive --remote # integrate upstream changes
  alice$ git commit -m 'Bumped submodules to upstream tips'
  alice$ git push
  bob$ git pull
  bob$ git submodule update --recursive # integrate Alice's gitlink changes

so it should be easy to toggle back and forth between the two
integration targets.  However:

  git fetch && git reset --hard origin/HEAD

is easy to run using 'git submodule foreach', or after changing into
the submodule directory, so I'm not particularly concerned here.

With tight-binding and superproject-checkout-time auto-updates, the
above would be:

  alice$ git submodule foreach --recursive 'git pull'
  alice$ git commit -m 'Bumped submodules to upstream tips'
  alice$ git push
  bob$ git pull  # update to Alice's new gitlinks happens automatically

Another problem with a single submodule.<name>.update is if I want to
pull origin/HEAD into my submodule's master branch, but origin/dev
into my dev and feature-x branches (table and further discussion in
[3]).  That's not going to pack down into a single command.  In [3], I
lay out how you could setup per-superproject-branch configs for
.local-branch; you'd need something similar for .update.

> I now realize that my above arguments against increased complexity
> in submodule.<name>.* options arrive way too late, and is probably
> more like trolling than like constructive input.

I'm happy to have more input :).  It's hard to imagine an interface so
polished that it can't be improved, and the current submodule
interface is certainly well short of that hypothetical goal ;).

> >> +# D6: The meaning of submodule.<name>.branch is initially confusing, as it does
> >> +#     not really concern the submodule's local branch (except as a naming hint
> >> +#     when the submodule is first cloned). Instead, submodule.<name>.branch is
> >> +#     really about which branch in the _upstream_ submodule
> >
> > Which is how gitmodules(5) explains it:
> >
> >   submodule.<name>.branch
> >     A remote branch name for tracking updates…
> 
> Good, but I fear gitmodules(5) is too hidden for the regular user.
> It'd be better to mention this in git-submodule(1), as I expect
> gitmodules(5) is largely read by .gitmodules _authors_, and not
> regular users. Obviously, the real fix would be a better name...

I'm fine with a rename to .remote-branch.  Migrating through a config
rename should be pretty easy to do, but it's going to make future
changes more difficult until the migration wraps up $n releases in the
future.  I'd prefer jumping straight to my tight-binding approach,
though; where --remote updates are replaced by the more familiar:

  $ git submodule foreach [--recursive] 'git pull …'

> >> +#     submodule.<name>.url, or by the submodule's remote.origin.url?)
> >> +#     want to integrate with.
> >
> > The submodule's remote.origin.url for everything except the initial
> > clone (*.*.*.1).  See my response to T2.
> 
> As mentioned above, submodule.<name>.url is then an unnecessary state
> duplication.

It's not unnecessary before you've cloned.  And you'll also want to
update it if you change the submodule's remote.origin.url, so other
folks (who haven't cloned yet) aren't stranded with the old URL.

> >> +# D7: What to do when .branch refers to a branch that is missing from upstream?
> >> +#     Currently, when trying to clone, the clone fails (which causes 'git
> >> +#     submodule update --remote' to fail), but leaves the submodule in an
> >> +#     uninitialized state (there is a .git, but the work tree is missing).
> >> +#     This is probably not the behavior we want...
> >> +#     Affects: pre, 3.2.2.1, 3.3.2.1, 3.4.2.1, 3.5.2.1
> >
> > I think we should remove the submodule's .git file after the failed clone.
> 
> Agreed, but does that extend to the superproject's .git/modules/<name> too?

That's what 'git clone' usually does, so yes.  Ideally we'd add a
check to the initial clone that bailed out before fetching tons of
objects in the event that the referenced branch / commit was not
present on the remote.

Cheers,
Trevor

[1]: http://article.gmane.org/gmane.comp.version-control.git/240370
[2]: http://article.gmane.org/gmane.comp.version-control.git/240336
[3]: http://article.gmane.org/gmane.comp.version-control.git/240251

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
Attachment:
signature.asc

Description: OpenPGP digital signature