Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: > I was on a crappy connection and it was frustrated seeing git-clone > reached 80% then failed, then started over again. Can we support > resumable git-clone at some level? I think we could split into several > small packs, keep fetched ones, just get missing packs until we have > all. This is uh, difficult over the native git protocol. The problem is the native protocol negotiates what the client already has and what it needs by comparing sets of commits. If the client says "I have commit X" then the server assumes it has not only commit X _but also every object reachable from it_. Now packfiles are organized to place commits at the front of the packfile. So a truncated download will give the client a whole host of commits, like maybe all of them, but none of the trees or blobs associated with them as those come behind the commits. Worse, the commits are sorted most recent to least recent. So if the client claims he has the very first commit he received, that is currently an assertion that he has the entire repository. I have been thinking about this resumable fetch idea for the native protocol for a few days now, like since the last time it came up on #git. One possiblity is to have the client store locally in a temporary file the list of wants and the list of haves it sent to the server during the last fetch. During a resume of a packfile download we actually just replay this list of wants/haves, even if the server has newer data. We also tell the server which object we last successfully downloaded (its SHA-1). The server would only accept the resumed want list if all of the wants are reachable from its current refs. If one or more aren't then they are just culled from the want list; this way you can still successfully resume a download of say git.git where pu rebases often. You just might not get pu without going back for it. If the server always performs a very stable (meaning we don't ever change the sorting order!) and deterministic sorting of the objects in the packfile then given the same list of wants/haves and a "prior" point it can pickup from where it left off. At worst we are retransmitting one whole object again, e.g. the client had all but the last byte of the object, so it was no good. I'm willing to say we do the full object retransmission in case the object was recompressed on the server between the first fetch and the second. It just simplifies the restart. Probably not that difficult. The hardest part is committing to the object sorting order so that when we ask for a restart we *know* we didn't miss an object. > I didn't clone via http so I don't know if http supports resumable. This would have a better chance at doing a resume. Looking at the code it looks like we do in fact resume a packfile download if it was truncated. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html