Re: Stalled git cloning and possible solutions

"V.Krishn" <vkrishn4@xxxxxxxxx> · Wed, 4 Sep 2013 06:36:48 +0530

On Friday, August 30, 2013 03:48:44 AM you wrote:
> "V.Krishn" <vkrishn4@xxxxxxxxx> writes:
> > On Friday, August 30, 2013 02:40:34 AM you wrote:
> >> V.Krishn wrote:
> >> > Quite sometimes when cloning a large repo stalls, hitting Ctrl+c
> >> > cleans what been downloaded, and process needs re-start.
> >> > 
> >> > Is there a way to recover or continue from already downloaded files
> >> > during cloning ?
> >> 
> >> No, sadly.  The pack sent for a clone is generated dynamically, so
> >> there's no easy way to support the equivalent of an HTTP Range request
> >> to resume.  Someone might implement an appropriate protocol extension
> >> to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
> >> but for now it doesn't exist.
> > 
> > This is what I tried but then realized something more is needed:
> > 
> > During stalled clone avoid  Ctrl+c.
> > 1. Copy the content .i.e .git folder some other place.
> > 2. cd <new dir>
> > 3. git config fetch.unpackLimit 999999
> > 4. git config transfer.unpackLimit 999999
> 
> These two steps will not help, as negotiation between the sender and
> the receiver is based on the commits that are known to be complete,
> and an earlier failed "fetch" will not (and should not) update refs
> on the receiver's side.
> 
> >> What you *can* do today is create a bundle from the large repo
> >> somewhere with a reliable connection and then grab that using a
> >> resumable transport such as HTTP.
> 
> Yes.
> 
> Another possibility is, if the project being cloned has a tag (or a
> branch) that points at a commit back when it was smaller, do this
> 
> 	git init x &&
>         cd x &&
>         git fetch $that_repository
> $that_tag:refs/tags/back_then_i_was_small
> 
> to prime the object store of a temporary repository 'x' with a
> hopefully smaller transfer, and then use it as a "--reference"
> repository to the real clone.

What more files/info would be needed.
I noticed the tmp_pack_xxxxxx may not have object type commit/tree.
Do I need to manually create .git/refs..

I was wondering the following would further help in recovering.

A
1. If pack file was created in sequence to commit history(date), i.e 
blob+commit+tree....tags...+blob+commit+tree. 
also if in parallel idx was also created or atleast a tmp idx.
2. Update other files in .git dir before pack process.
    (as stated in previous email).
3. Objects are named like datestamp(epoch)+sha1 
     and stored in epoch directory. (date fmt can be yymmdd).
     (this might break back-compat)
4. Add "git fsck --defrag [1..4]" 
   #this can take another parameter like level, 
     applying various heuristic optimization.

B
Another option would be:
git clone <url> --use-method=rsync
this would transfer files as is in .git dir (ones necessary).
And run `git gc` or any other housekeeping upon completion.
This method would allow resuming.
Cons:
  Any change in pack file on server during download becomes a potential issue.

The clone resume may not be a priority but if a minor changes can help in 
recovery, this would be nice. 

I still like the bundle method if git services made this easy.

-- 
Regards.
V.Krishn
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html