Re: [PATCH 5/8] Documentation: add Packfile URIs design doc

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Mon, 1 Jun 2020 16:10:40 -0700

> Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> > In practice, I suspect that these fetches would go in parallel with
> > the processing of the in-protocol packfile, but spelling it out as
> > if these are done sequencially would help establishing the right
> > mental model.  
> >
> > "(1) Process in-protocol packfiles first, and then (2) fetch CDN
> > URIs, and after all is done, (3) update the tips of refs" would
> > serve as a base to establish a good mental model.  It would be even
> > better to throw in another step before all that: (0) record the
> > wanted-refs and CDN URIs to the safe place.  If you get disconnected
> > before finishing (1), you have to redo from the scratch, but once
> > you finished (0) and (1), then (2) and (3) can be done at your
> > leisure using the information you saved in step (0), and (1) can be
> > retried if your connection is lousy.
> 
> We need to be a bit careful here.  After finishing (0) and (1), the
> most recent history near the requested tips is not anchored by any
> ref.  We of course cannot point these "most recent" objects with
> refs because it is very likely that they are not connected to the
> parts of the history we already have in the receiving repository.
> The huge gap exists to be filled by the bulk download from CDN.
> 
> So a GC that happens before (3) completes can discard object data
> obtained in step (1).  One way to protect it may be to use a .keep
> file but then some procedure needs to be there to remove it once we
> are done.  Perhaps at the end of (1), the name of that .keep file is
> added to the set of information we keep until (3) happens (the
> remainder of the set of information was obtained in step (0)).

Yes, this is precisely what we're doing - the packs obtained through the
packfile URIs are all written with keep files, and the names of the keep
files are added to a list. They are then deleted at the same time that
the regular keep file (the one generated during an ordinary fetch) is
deleted.

I'll also add this information to the spec.