Jonathan Nieder <jrnieder@xxxxxxxxx> writes: > Junio C Hamano wrote: >> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > >>> One possibility to conceptually have the same thing without the overhead >>> of the list is to put the obtained-from-elsewhere objects into its own >>> alternate object store, so that we can distinguish the two. >> >> Now you are talking. Either a separate object store, or a packfile >> that is specially marked as such, would work. > > Jonathan's not in today, so let me say a few more words about this > approach. > > This approach implies a relaxed connectivity guarantee, by creating > two classes of objects: > > 1. Objects that I made should satisfy the connectivity check. They > can point to other objects I made, objects I fetched, or (*) objects > pointed to directly by objects I fetched. More on (*) below. Or objects that are referred to by objects I fetched. If you narrowly clone while omitting a subdirectory, updated a file that is outside the subdirectory, and created a new commit, while recording the same tree object name for the directory you do not know its contents (becaues you didn't fetch), then it is OK for the top-level tree of the resulting commit you created to be pointing at the tree that represents the subdirectory you never touched. > The complication is in the "git gc" operation for the case (*). > Today, "git gc" uses a reachability walk to decide which objects to > remove --- an object referenced by no other object is fair game to > remove. With (*), there is another kind of object that must not be > removed: if an object that I made, M, points to a missing/promised > object, O, pointed to by a an object I fetched, F, then I cannot prune > F unless there is another fetched object present to anchor O. Absolutely. Lazy-objects support comes with certain cost and this is one of them. But I do not think it is realistic to expect that you can prune anything you fetched from the "other place" (i.e. the source 'lazy-objects' hook reads from). After all, once they give out objects to their clients (like us in this case), they cannot prune it, if we take the "implicit promise" approach to avoid the cost to transmit and maintain a separate "object list". > For example: suppose I have a sparse checkout and run > > git fetch origin refs/pulls/x > git checkout -b topic FETCH_HEAD > echo "Some great modification" >> README > git add README > git commit --amend > > When I run "git gc", there is nothing pointing to the commit that was > pointed to by the remote ref refs/pulls/x, so it can be pruned. I > would naively also expect that the tree pointed to by that commit > could be pruned. But pruning it means pruning the promise that made > it permissible to lack various blobs that my topic branch refers to > that are outside the sparse checkout area. So "git gc" must notice > that it is not safe to prune that tree. > > This feels hacky. I prefer the promised object list over this > approach. I think they are moral equivalents implemented differently with different assumptions. The example we are discussing makes an extra assumption: In order to reduce the cost of transferring and maintaining the list, we assume that all objects that came during that transfer are implicitly "promised", i.e. everything behind each of these objects will later be available on demand. How these objects are marked is up to the exact mechanism (my preference is to mark the resulting packfile as special; Jon Tan's message to which my message was a resopnse alluded to using an alternate object store). If you choose to maintain a separate "object list" and have the "other side" explicitly give it, perhaps you can lift that assumption and replace it with some other assumption that assumes less. > Can you spell this out more? To be clear, are you speaking as a > reviewer or as the project maintainer? In other words, if other > reviewers are able to settle on a design that involves a relaxed > guarantee for fsck in this mode that they can agree on, does this > represent a veto meaning the patch can still not go through? Consider it a veto over punting without making sure that we can later come up with a solution to give such a guarantee. I am not getting a feeling that "other reviewers" are even seeking a "relaxed guarantee"---all I've seen in the thread is to give up any guarantee and to hope for the best.