Han Young <hanyang.tony@xxxxxxxxxxxxx> writes: > On Wed, Oct 9, 2024 at 5:36 AM Calvin Wan <calvinwan@xxxxxxxxxx> wrote: > > > Objects that are in promisor packs, specifically the ones that have the > > flag, packed_git::pack_promisor, set. However, since this design doc > > was sent out, it turns out the creation of a set of promisor pack objects > > in a large repository (such as Android or Chrome) is very expensive, so > > this design is infeasible in my opinion. > > I wonder if a set of local loose/pack objects will be cheaper to construct? > Normally loose objects are always non-promisor objects, unless the user > running something like `unpack-objects`. We had a similar idea at $JOB. Note that you don't actually need to create the set - when looking up an object using oid_object_info_extended(), we know if it's a loose object and if not, which pack it is in. The pack has a promisor bit that we can check. Note that there is a possibility of a false positive. If the same object is in two packs - one promisor and one non-promisor - I believe there's no guarantee that one pack will be preferred. So we can see that the object is in a non-promisor pack, but there's no guarantee that it's not also in a promisor pack. For the omit-local-commits-in-"have" solution, this is a fatal flaw (we absolutely must guarantee that we don't send any promisor commits) but for the repack-on-fetch solution, this is no big deal (we are looking for objects to repack into a promisor pack, and repacking a promisor object into a promisor pack is perfectly file). For this reason, I think the repack-on-fetch solution is the most promising one so far. Loose objects are always non-promisor objects, yes. (I don't think the user running `unpack-objects` counts - the user running a command on a packfile in the .git directory is out of scope, I think.) > > > After a lazy clone that omits a lot of objects acquires many objects > > > over time by fetching missing objects on demand, wouldn't we want to > > > have an option to "slim" the local repository by discarding some of > > > these objects (the ones that are least frequently used), relying on > > > the promise by the promisor remote that even if we did so, they can > > > be fetched again? Can we treat loss of C2a/C2b/C2 as if such a > > > feature prematurely kicked in? Or are we failing to refetch them > > > for some reason? > > > > Yes if such a feature existed, then it would be feasible and a possible > > solution for this issue (I'm leaning quite towards this now after testing > > out some of the other designs). > > Since no partial clone filter omits commit objects, we always assume > commits are available in the codebase. `merge` reports "cannot merge > unrelated history" if one of the commits is missing, instead of trying to > fetch it. > Another problem is current lazy fetching code does not report "haves" > to remote, so a lazy fetching of commit ended up pulling all the trees, > blobs associated with that commit. > I also prefer the "fetching the missing objects" approach, making sure > the repo has all the "correct" objects is difficult to get right. If I remember correctly, our intention (or, at least, my intention) of not treating missing commits differently was so that we don't limit the solutions that we can implement. For example, we had the idea of server-assisted merge base computation - this and other features would make it feasible to omit commits locally. It has not been implemented, though.