Re: Stalled git cloning and possible solutions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 30, 2013 at 4:10 AM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> V.Krishn wrote:
>
>> Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what
>> been downloaded, and process needs re-start.
>>
>> Is there a way to recover or continue from already downloaded files during
>> cloning ?
>
> No, sadly.  The pack sent for a clone is generated dynamically, so
> there's no easy way to support the equivalent of an HTTP Range request
> to resume.  Someone might implement an appropriate protocol extension
> to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
> but for now it doesn't exist.

OK how about a new capability "resume" to upload-pack. fetch-pack can
then send capability "resume[=<SHA-1>,<skip>]" to upload-pack. The
first time it sends "resume" without parameters, and upload-pack will
send back an SHA-1 to identify the pack being transferred together
with a full pack as usual. When early disconnection happens, it sends
the received SHA-1 and the received pack's size so far. It either
receives the remaining part, or a full pack.

When upload-pack gets "resume", it calculates a checksum of all input
that may impact pack generation. If the checksum matches the SHA-1
from fetch-pack, it'll continue to generate the pack as usual, but
will skip sending the first <skip> bytes (maybe with a fake header so
that fetch-pack realizes this is a partial pack). If the checksum does
not match, it sends full pack again. I count on index-pack to spot
corrupt resumed pack due to bugs.

The input to calculate SHA-1 checksum includes:

 - the result SHA-1 list from rev-list
 - git version string
 - .git/shallow
 - replace object database
 - pack.* config
 - maybe some other variables (I haven't checked pack-objects)

Another Git implementation can generate this SHA-1 in a totally
different way and may even cache the generated pack.

If at resume time, the load balancer directs the request to another
upload-pack that generates this SHA-1 differently, ok this won't work
(i.e. full pack is returned). In a busy repository, some refs may have
moved so rev-list result at the resume time won't match any more, but
we can deal with that later by relaxing to allow "want " lines with
SHA-1 that are reachable from current refs, not just one of the refs
(pack v4 or reachability bitmaps help).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]