Han Young <hanyang.tony@xxxxxxxxxxxxx> writes: > Here are the minimal steps to recreate issue. [snip] I think the following is what is happening. Before the final gc, the repo looks as follows: commit tree blob C3 ---- T3 -- B3 (fetched from remote, in promisor pack) | C2 ---- T2 -- B2 (created locally, in non-promisor pack) | C1 ---- T1 -- B1 (fetched from remote, in promisor pack) After the final gc, {C,T,B}3 and {C,T,B}1 are in a promisor pack, but all of {C,T,B}2 are deleted because they are thought to be promisor objects. > The last `git gc` will error out on fsck with error message like this: > > error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94 > error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94 I'm not sure how `git gc` (or `git fsck`) knows the name of this object (what is the type of this object, and which object refers to this object?) but I think that if we implement one of the solutions I describe below, this problem will go away. > `git gc` will call `git repack`, which will call `git pack-objects` > twice on a partially cloned repo. The first call to pack-objects > combines all the promisor packfiles, and the second pack-objects > command packs all reachable non-promisor objects into a normal packfile. Yes, this is what I remember. > However, a bug in setup_revision caused some non-promisor objects > to be mistakenly marked as in promisor packfiles in the second call > to pack-objects. I think they ({C,T,B}2 in the example above) should be considered as promisor objects, actually. From the partial clone doc, it says of a promisor object: "the local repository has that object in one of its promisor packfiles, or because another promisor object refers to it". C3 (in a promisor packfile) refers to C2, so C2 is a promisor object. It refers to T2, which refers to B2, so all of them are promisor objects. However, in the Git code, I don't think this definition is applied recursively - if I remember correctly, we made the assumption (e.g. in is_promisor_object() in packfile.c) that we only need to care about objects in promisor packfiles and the objects they directly reference, because how would we know what objects they indirectly reference? But this does not cover the case in which we know what objects they indirectly reference because we pushed them to the server in the first place. Solutions I can think of: - When fetching from a promisor remote, never declare commits that are not in promisor packfiles as HAVE. This means that we would refetch C2 and T2 as being in promisor packfiles. But it's wasteful in network bandwidth, and does not repair problems in Git repos created by Git versions that do not have this solution. - When fetching from a promisor remote, parse every object and repack any local objects referenced (directly or indirectly) into a promisor packfile. Also does not repair problems. - When repacking all objects in promisor packfiles, if any object they refer to is present in a non-promisor packfile, do a revwalk on that object and pack those objects too. The repack will probably be slower because each object now has to be parsed. The revwalks themselves probably will not take too long, since they can stop at known promisor objects. (One other thing that might be considered is, whenever pushing to a promisor remote, to write the pack that's pushed as a promisor packfile. But I don't think this is a good idea - the server may not retain any packs that were pushed.)