Re: Problem with --shallow-submodules option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 30, 2016 at 6:27 AM, Istvan Zakar <istvan.zakar@xxxxxxxxx> wrote:
> Hello,
>
> Thanks for your answers. I tested it after the changes were made on
> the git server, and it seems to be working. But some other issue came
> up.
>
> We have quite many submodules in our project so I did some comaprision:
>
> If I do a clone with these parameters:
> --jobs 20 --recurse-submodules
>
> The clone lasts ~53 seconds, and the total size of the folder is around 2 GB.
>
> If I add the shallow-submodules option, the size of the folder will be
> a bit below 1GB, so the size decreased as I expected, but the time of
> the clone itself increased to 90 seconds. It seems the last step of
> the command, checking out the submodules is executed one-by-one, and
> not in parallel, so it seems at this step the jobs parameter does not
> have effect.
>
> Is it intentional, or there is some option I missed?

It was intentional at the time of submitting the patches.
The checkout phase is a bit complicated as it combines the
newly cloned submodules as well as the submodules to incrementally
fetch into one bucket and treats them the same.

And for submodules that were fetched incrementally you may run into problems
when combining that with the local state (e.g. rebase or merge configured in
`submodule.<name>.update` or passed on the command line), which requires
human interaction (resolving the merge conflict), which we want to present one
at a time to the user.

The handling for the user is not quite clear, when to stop, see:
15ffb7cde48b73b3d5ce259443db7d2e0ba13750 (submodule update: continue
when a checkout fails)
877449c136539cf8b9b4ed9cfe33a796b7b93f93 (git-submodule.sh: clarify
the "should we die now" logic)

So we want to die as soon as we see a merge conflict or other
error that is likely to require some human interaction.
To do that properly we need to have complicated logic or just update
one submodule at a time.

For initial checkouts we know that there will be no merge conflicts, i.e.
it will be a "checkout -f" (with an implicit must_die_on_failure=no)
So we could run all checkouts of submodules in parallel, too. We'd
just need to write the patch for that.

As the cloning is already done in parallel, we can hook into the initial
checkout there easily. I'd build that on top of [1], creating a similar commit.
In the successful case of `update_clone_task_finished` (the case with
`!result`  -> return 0;) we would need to add the checkout command to
the queue instead of just finishing.

[1] https://github.com/gitster/git/commit/665b35eccd39fefd714cb5c332277a6b94fd9386


>
> I'm using git 2.9.0 on client side.
>
> Thanks,
>    Istvan
>
> ps: if I update the submodules with --depth 1 parameter in parallel
> using xargs it lasts about 18 seconds, so it's a workaround for this
> issue, but it would be nice to do it with a single command.
>
>
>
>
> On 22 June 2016 at 17:31, Fredrik Gustafsson <iveqy@xxxxxxxxx> wrote:
>> On Mon, Jun 20, 2016 at 01:06:39PM +0000, Istvan Zakar wrote:
>>> I'm working on a relatively big project with many submodules. During
>>> cloning for testing I tried to decrease the amount of data need to be
>>> fetched from the server by using --shallow-submodules option in the clone
>>> command. It seems to check out the tip of the remote repo, and if it's not
>>> the commit registered in the superproject the submodule update fails
>>> (obviously). Can I somehow tell to fetch that exact commit I need for my
>>> superproject?
>>
>> Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules
>> gives a good overview of this problem.
>>
>> git fetches a branch and is shallow from that branch, which might be an
>> other sha1 than the one the submodule points to, (as you say). This
>> is/was one of the drawbacks with this method. However the since git 2.8,
>> git will try to fetch the sha1 direct (and not the branch). So then it
>> will work, if(!), the server supports direct access to sha1. This was
>> previously not allowed due to security concerns (if I recall correctly).
>>
>> So the answer is, yes this will work if you've a recent version of git
>> and support on the server side for doing this. Unfortunately I'm not
>> sure which git version is needed on the server side for this to work.
>>
>> --
>> Fredrik Gustafsson
>>
>> phone: +46 733-608274
>> e-mail: iveqy@xxxxxxxxx
>> website: http://www.iveqy.com
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]