Re: [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 02 Aug 2017 13:51:37 -0700

Jonathan Nieder <jrnieder@xxxxxxxxx> writes:

> Junio C Hamano wrote:
>> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
>
>>> One possibility to conceptually have the same thing without the overhead
>>> of the list is to put the obtained-from-elsewhere objects into its own
>>> alternate object store, so that we can distinguish the two.
>>
>> Now you are talking.  Either a separate object store, or a packfile
>> that is specially marked as such, would work.
>
> Jonathan's not in today, so let me say a few more words about this
> approach.
>
> This approach implies a relaxed connectivity guarantee, by creating
> two classes of objects:
>
>  1. Objects that I made should satisfy the connectivity check.  They
>     can point to other objects I made, objects I fetched, or (*) objects
>     pointed to directly by objects I fetched.  More on (*) below.

Or objects that are referred to by objects I fetched.  If you
narrowly clone while omitting a subdirectory, updated a file
that is outside the subdirectory, and created a new commit, while
recording the same tree object name for the directory you do not
know its contents (becaues you didn't fetch), then it is OK for the
top-level tree of the resulting commit you created to be pointing
at the tree that represents the subdirectory you never touched.

> The complication is in the "git gc" operation for the case (*).
> Today, "git gc" uses a reachability walk to decide which objects to
> remove --- an object referenced by no other object is fair game to
> remove.  With (*), there is another kind of object that must not be
> removed: if an object that I made, M, points to a missing/promised
> object, O, pointed to by a an object I fetched, F, then I cannot prune
> F unless there is another fetched object present to anchor O.

Absolutely.  Lazy-objects support comes with certain cost and this
is one of them.  

But I do not think it is realistic to expect that you can prune
anything you fetched from the "other place" (i.e. the source
'lazy-objects' hook reads from).  After all, once they give out
objects to their clients (like us in this case), they cannot prune
it, if we take the "implicit promise" approach to avoid the cost to
transmit and maintain a separate "object list".

> For example: suppose I have a sparse checkout and run
>
> 	git fetch origin refs/pulls/x
> 	git checkout -b topic FETCH_HEAD
> 	echo "Some great modification" >> README
> 	git add README
> 	git commit --amend
>
> When I run "git gc", there is nothing pointing to the commit that was
> pointed to by the remote ref refs/pulls/x, so it can be pruned.  I
> would naively also expect that the tree pointed to by that commit
> could be pruned.  But pruning it means pruning the promise that made
> it permissible to lack various blobs that my topic branch refers to
> that are outside the sparse checkout area.  So "git gc" must notice
> that it is not safe to prune that tree.
>
> This feels hacky.  I prefer the promised object list over this
> approach.

I think they are moral equivalents implemented differently with
different assumptions.  The example we are discussing makes an extra
assumption: In order to reduce the cost of transferring and
maintaining the list, we assume that all objects that came during
that transfer are implicitly "promised", i.e. everything behind each
of these objects will later be available on demand.  How these
objects are marked is up to the exact mechanism (my preference is to
mark the resulting packfile as special; Jon Tan's message to which
my message was a resopnse alluded to using an alternate object
store).  If you choose to maintain a separate "object list" and have
the "other side" explicitly give it, perhaps you can lift that
assumption and replace it with some other assumption that assumes
less.

> Can you spell this out more?  To be clear, are you speaking as a
> reviewer or as the project maintainer?  In other words, if other
> reviewers are able to settle on a design that involves a relaxed
> guarantee for fsck in this mode that they can agree on, does this
> represent a veto meaning the patch can still not go through?

Consider it a veto over punting without making sure that we can
later come up with a solution to give such a guarantee.  I am not
getting a feeling that "other reviewers" are even seeking a "relaxed
guarantee"---all I've seen in the thread is to give up any guarantee
and to hope for the best.