Re: Initial git clone behaviour

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 6 Jan 2016 15:14:45 -0800

On Wed, Jan 6, 2016 at 2:26 PM, Eric Curtin <ericcurtin17@xxxxxxxxx> wrote:
>
> Often I do a standard git clone:
>
> git clone (name of repo)
>
> Followed by a depth=1 clone in parallel, so I can get building and
> working with the code asap:
>
> git clone --depth=1 (name of repo)
>
> Could we change the default behavior of git so that we initially get
> all the current files quickly so that we can start working them and
> then getting the rest of the data? At least a user could get to work
> quicker this way. Any disadvantages of this approach?

It would put more burden on a shared and limited resource (i.e.
the server side).

For example, I just tried a depth=1 clone of Linus's repository from

  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

which transferred ~150MB pack data to check out 52k files in 90 seconds.

On the other hand, a full clone transferred ~980MB pack data and it took
170 seconds to complete. You can already see that a full clone is highly
optimized--it does not take even twice the time of getting the most recent
checkout to grab 10 years worth of development (562k of commits).

This efficiency comes from some tradeoffs, and one of them is that not
all the data necessary to check out the latest tree contents can be stored
near the beginning of the pack data. So "we'll checkout the tip while the
remainder of the data is still incoming" would not be a workable, unless
you are willing to destroy the full-clone performance.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html