Re: Shallow submodule efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 27, 2016 at 10:39 PM, Martin von Gagern
<Martin.vGagern@xxxxxxx> wrote:
> Hi!
>
> I have the feeling that “git submodule update --depth 1” is less clever
> than it could be. Here is one example I observed with git 2.0.0:

2.9.0 (as "Direct fetching of " is not part of 2.0.0 IIRC) ?

>
>   git init foo
>   cd foo
>   git clone --single-branch \
>             -b v0.99 https://github.com/git/git.git git-scm
>   git submodule add https://github.com/git/git.git git-scm
>   git commit -m Submod
>   git clone --dissociate . ../bar
>   cd ../bar
>   git submodule update --init --depth 1 git-scm
>
> This will download quite a bit of history, then result in an error message:
>
>   error: no such remote ref a3eb250f996bf5e12376ec88622c4ccaabf20ea8
>   Fetched in submodule path 'git-scm', but it did not contain
>   a3eb250f996bf5e12376ec88622c4ccaabf20ea8. Direct fetching of that
>   commit failed.

Yeah there are a few things going on, which try to cover up an error
in design IMO.

* The depth is measured from the tip of a branch in the submodule,
   not from the sha1 that the superproject points to.
* Shallowness is treated separately in the superproject and submodules as they
  have a strong notion of being independent. It would be cool to have a thing
  `git clone --recurse-submodules --depth=15
--submodule-depth-as-reachable-from-superproject`
  which would obtain the submodules as shallow as possible, but it
includes all versions that
  the 15 commits in the superproject points to. (may be 1 up to 15
  different non-sequential versions)


>
> That seems so avoidable, since the commit in question is a tag, so it
> would be perfectly possible to fetch that specific commit from the
> server directly. Something like the following commands would do the trick:
>
>   git fetch $url $(git ls-remote $url | \
>                    awk /$sha1/'{print $2}' | sed 's/\^{}//')
>

* `git submodule update --init --depth 1` is using clone instead of fetch
  currently when the submodule doesn't exist yet. The clone is buried in
  the `submodule--helper update-clone` that is a mixture of listing
the submodules
  and cloning multiple submodules in parallel if possible. So I would
assume it is
  easier to teach git clone to behave correctly and then stop retrying
in git-submodule.sh
  if `just_cloned` is set in the `cmd_update()`.

> If the commit in question is NOT a ref, then whether asking for it by
> unlisted SHA1 is supported will probably depend on the server's
> uploadpack.allowReachableSHA1InWant setting. I guess this is a reason
> why fb43e31 made the fetch for a specific SHA1 a fallback after the
> fetch for the default branch. Nevertheless, in case of “--depth 1” I
> think it would make sense to abort early: if none of the listed refs
> matches the requested one, and asking by SHA1 isn't supported by the
> server, then there is no point in fetching anything, since we won't be
> able to satisfy the submodule requirement either way.

Makes sense! I think the easiest way forward to implement this will be:

* `git clone` learns a (maybe undocumented internal) option `--get-sha1`
  `--branch` looks similar to what we want, but doesn't quite fit as we do not
  know, whether we're on a tag or not. The submodule tells us just the
  recorded sha1, not the branch/tag. So maybe we'd end up calling it
  `--detach-at=<sha1>`, that will
  -> inspect the ls-remote for the sha1 being there
  -> if the sha1 is there (at least once) clone as if --branch <tag> was given
  -> if not found and the server advertised  allowReachableSHA1InWant,
try again inside the clone

* `submodule--helper update-clone` passes the  `--get-sha1` to the
clones of the submodules

* cmd_update() in git-submodule.sh will only checkout submodules and
not try again
  to fetch them if `just_cloned` is set as the cloning did the best it could.


>
> For the case of “--depth n” with n > 1, I was wondering whether it would
> make sense to prefer the branch listed in submodule.‹name›.branch over
> the default branch.

Makes sense to me.

>
> I think shallow submodules would be very useful to embed libraries into
> projects, without too much care for history (and without the download
> times getting it entails), but with efficient updates to affected files
> only in case of a change in library version. But not being able to get a
> specific tag as a shallow submodule is a major showstopper here, I think.

Thanks for taking your time to point this out and start this discussion!

Thanks,
Stefan

>
> Greetings,
>  Martin von Gagern
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]