On Thu, Nov 28, 2013 at 04:09:18PM +0700, Duy Nguyen wrote: > > Git should be better support resume transfer. > > It now seems not doing better it’s job. > > Share code, manage code, transfer code, what would it be a VCS we imagine it ? > > You're welcome to step up and do it. On top of my head there are a few options: > > - better integration with git bundles, provide a way to seamlessly > create/fetch/resume the bundles with "git clone" and "git fetch" I posted patches for this last year. One of the things that I got hung up on was that I spooled the bundle to disk, and then cloned from it. Which meant that you needed twice the disk space for a moment. I wanted to teach index-pack to "--fix-thin" a pack that was already on disk, so that we could spool to disk, and then finalize it without making another copy. One of the downsides of this approach is that it requires the repo provider (or somebody else) to provide the bundle. I think that is something that a big site like GitHub would do (and probably push the bundles out to a CDN, too, to make getting them faster). But it's not a universal solution. > - stablize pack order so we can resume downloading a pack I think stabilizing in all cases (e.g., including ones where the content has changed) is hard, but I wonder if it would be enough to handle the easy cases, where nothing has changed. If the server does not use multiple threads for delta computation, it should generate the same pack from the same on-disk deterministically. We just need a way for the client to indicate that it has the same partial pack. I'm thinking that the server would report some opaque hash representing the current pack. The client would record that, along with the number of pack bytes it received. If the transfer is interrupted, the client comes back with the hash/bytes pair. The server starts to generate the pack, checks whether the hash matches, and if so, says "here is the same pack, resuming at byte X". What would need to go into such a hash? It would need to represent the exact bytes that will go into the pack, but without actually generating those bytes. Perhaps a sha1 over the sequence of <object sha1, type, base (if applicable), length> for each object would be enough. We should know that after calling compute_write_order. If the client has a match, we should be able to skip ahead to the correct byte. > - remote alternates, the repo will ask for more and more objects as > you need them (so goodbye to distributed model) This is also something I've been playing with, but just for very large objects (so to support something like git-media, but below the object graph layer). I don't think it would apply here, as the kernel has a lot of small objects, and getting them in the tight delta'd pack format increases efficiency a lot. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html