Re: [PATCH] fetch-pack: write fetched refs to .promisor

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Thu, 5 Sep 2019 11:39:26 -0700

> I'm not really opposed to what you're doing here, but I did recently
> think of another possible use for .promisor files. So it seems like a
> good time to bring it up, since presumably we'd have to choose one or
> the other.

Thanks for bringing it up - yes, we should discuss this.

> I noticed when playing with partial clones that the client may sometimes
> pause for a while, chewing CPU. The culprit is is_promisor_object(),
> which generates the list of known promisor objects by opening every
> object we _do_ have to find out which ones they mention.
> 
> I know one of the original design features of the promisor pack was that
> the client would _not_ keep a list of all of the objects it didn't have.
> But I wonder if it would make sense to keep a cache of these "cut
> points" in the partial clone. That's potentially smaller than the
> complete set of objects (especially for tree-based partial cloning), and
> it seems clear we're willing to store it in memory anyway.

Well, before the current design was implemented, I had a design that had
such a list of missing objects. :-) I couldn't find a writeup, but here
is some preliminary code [1]. In that code, as far as I can tell, the
server gives us the list directly during fetch and the client merges it
with a repository-wide file called $GIT_DIR/objects/promisedblob, but we
don't have to follow the design (we could lazily generate the file, have
per-packfile promisedblob files, etc.).

[1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@xxxxxxxxxx/

> And if we do that, would the .promisor file for a pack be a good place
> to store it?

After looking at [1], it might be better in another place. If we want to
preserve fast fetches, we still need another file to indicate that the
pack is a promisor, so ".promisor" seems good for that. The presence or
absence of the cutoff points is a separate issue and could go into a
separate file, and it might be worth putting all cutoff points into a
single per-repository file too.