Re: [External] Re: Missing Promisor Objects in Partial Repo Design Doc

Han Young <hanyang.tony@xxxxxxxxxxxxx> · Wed, 2 Oct 2024 15:57:57 +0800

On Wed, Oct 2, 2024 at 10:55 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:

> Stepping back a bit, why is the loss of C2a/C2b/C2 a problem after
> "git gc"?  Wouldn't these "missing" objects be lazily fetchable, now
> C3 is known to the remote and the remote promises everything
> reachable from what they offer are (re)fetchable from them?  IOW, is
> this a correctness issue, or only performance issue (of having to
> re-fetch what we once locally had)?
>
> Is this true?  Can we tell, when trying to access C2a/C2b/C2 after
> the current version of "git gc" removes them from the local object
> store, that they are missing due to repository corruption?  After
> all, C3 can reach them so wouldn't it be possible for us to fetch
> them from the promisor remote?
>
> After a lazy clone that omits a lot of objects acquires many objects
> over time by fetching missing objects on demand, wouldn't we want to
> have an option to "slim" the local repository by discarding some of
> these objects (the ones that are least frequently used), relying on
> the promise by the promisor remote that even if we did so, they can
> be fetched again?  Can we treat loss of C2a/C2b/C2 as if such a
> feature prematurely kicked in?  Or are we failing to refetch them
> for some reason?

In a blobless clone, we expect commits and trees to be present in repo.
If C2/T2 is missing, commands like "git merge" will complain
"cannot merge unrelated history" and fail. Or commands like "git log" will
try to lazily fetch the commit, but without 'have' negotiation, end up
pulling all the trees and blobs reachable from that commit.

It's possible to minimize the impact of missing commits by adding negotiation
to lazy fetching, but we probably need to adapt code in many places where
we don't do lazy fetching. "git log", "git merge" commit graph etc. it's
no trivia amount of work.