> Junio C Hamano <gitster@xxxxxxxxx> writes: > > > In practice, I suspect that these fetches would go in parallel with > > the processing of the in-protocol packfile, but spelling it out as > > if these are done sequencially would help establishing the right > > mental model. > > > > "(1) Process in-protocol packfiles first, and then (2) fetch CDN > > URIs, and after all is done, (3) update the tips of refs" would > > serve as a base to establish a good mental model. It would be even > > better to throw in another step before all that: (0) record the > > wanted-refs and CDN URIs to the safe place. If you get disconnected > > before finishing (1), you have to redo from the scratch, but once > > you finished (0) and (1), then (2) and (3) can be done at your > > leisure using the information you saved in step (0), and (1) can be > > retried if your connection is lousy. > > We need to be a bit careful here. After finishing (0) and (1), the > most recent history near the requested tips is not anchored by any > ref. We of course cannot point these "most recent" objects with > refs because it is very likely that they are not connected to the > parts of the history we already have in the receiving repository. > The huge gap exists to be filled by the bulk download from CDN. > > So a GC that happens before (3) completes can discard object data > obtained in step (1). One way to protect it may be to use a .keep > file but then some procedure needs to be there to remove it once we > are done. Perhaps at the end of (1), the name of that .keep file is > added to the set of information we keep until (3) happens (the > remainder of the set of information was obtained in step (0)). Yes, this is precisely what we're doing - the packs obtained through the packfile URIs are all written with keep files, and the names of the keep files are added to a list. They are then deleted at the same time that the regular keep file (the one generated during an ordinary fetch) is deleted. I'll also add this information to the spec.