Re: Continue git clone after interruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 18 Aug 2009, Jakub Narebski wrote:

> You can probably get number and size taken by delta and non-delta (base)
> objects in the packfile somehow.  Neither "git verify-pack -v <packfile>"
> nor contrib/stats/packinfo.pl did help me arrive at this data.

Documentation for verify-pack says:

|When specifying the -v option the format used is:
|
|        SHA1 type size size-in-pack-file offset-in-packfile
|
|for objects that are not deltified in the pack, and
|
|        SHA1 type size size-in-packfile offset-in-packfile depth base-SHA1
|
|for objects that are deltified.

So a simple script should be able to give you the answer.

> >> (BTW what happens if this pack is larger than file size limit for 
> >> given filesystem?).
> > 
> > We currently fail.  Seems that no one ever had a problem with that so 
> > far. We'd have to split the pack stream into multiple packs on the 
> > receiving end.  But frankly, if you have a repository large enough to 
> > bust your filesystem's file size limit then maybe you should seriously 
> > reconsider your choice of development environment.
> 
> Do we fail gracefully (with an error message), or does git crash then?

If the filesystem is imposing the limit, it will likely return an error 
on the write() call and we'll die().  If the machine has a too small 
off_t for the received pack then we also die("pack too large for current 
definition of off_t").

> If I remember correctly FAT28^W FAT32 has maximum file size of 2 GB.
> FAT is often used on SSD, on USB drive.  Although if you have  2 GB
> packfile, you are doing something wrong, or UGFWIINI (Using Git For
> What It Is Not Intended).

Hopefully you're not performing a 'git clone' off of a FAT filesystem.  
For physical transport you may repack with the appropriate switches.

> >> If it fails, client ask first for first half of of
> >> repository (half as in bisect, but it is server that has to calculate
> >> it).  If it downloads, it will ask server for the rest of repository.
> >> If it fails, it would reduce size in half again, and ask about 1/4 of
> >> repository in packfile first.
> > 
> > Problem people with slow links have won't be helped at all with this.  
> > What if the network connection gets broken only after 49% of the 
> > transfer and that took 3 hours to download?  You'll attempt a 25% size 
> > transfer which would take 1.5 hour despite the fact that you already 
> > spent that much time downloading that first 1/4 of the repository 
> > already.  And yet what if you're unlucky and now the network craps on 
> > you after 23% of that second attempt?
> 
> A modification then.
> 
> First try ordinary clone.  If it fails because network is unreliable,
> check how much we did download, and ask server for packfile of slightly
> smaller size; this means that we are asking server for approximate pack
> size limit, not for bisect-like partitioning revision list.

If the download didn't reach past the critical point (75 MB in my linux 
repo example) then you cannot validate the received data and you've 
wasted that much bandwidth.

> > I think it is better to "prime" the repository with the content of the 
> > top commit in the most straight forward manner using git-archive which 
> > has the potential to be fully restartable at any point with little 
> > complexity on the server side.
> 
> But didn't it make fully restartable 2.5 MB part out of 37 MB packfile?

The front of the pack is the critical point.  If you get enough to 
create the top commit then further transfers can be done incrementally 
with only the deltas between each commits.

> A question about pack protocol negotiation.  If clients presents some
> objects as "have", server can and does assume that client has all 
> prerequisites for such objects, e.g. for tree objects that it has
> all objects for files and directories inside tree; for commit it means
> all ancestors and all objects in snapshot (have top tree, and its 
> prerequisites).  Do I understand this correctly?

That works only for commits.

> If we have partial packfile which crashed during downloading, can we
> extract from it some full objects (including blobs)?  Can we pass
> tree and blob objects as "have" to server, and is it taken into account?

No.

> Perhaps instead of separate step of resumable-downloading of top commit
> objects (in snapshot), we can pass to server what we did download in
> full?

See above.

> BTW. because of compression it might be more difficult to resume 
> archive creation in the middle, I think...

Why so?  the tar+gzip format is streamable.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]