On Mon, 31 Jul 2017 14:21:56 -0700 Junio C Hamano <gitster@xxxxxxxxx> wrote: > Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > > > Besides review changes, this patch set now includes my rewritten > > lazy-loading sha1_file patch, so you can now do this (excerpted from one > > of the tests): > > > > test_create_repo server > > test_commit -C server 1 1.t abcdefgh > > HASH=$(git hash-object server/1.t) > > > > test_create_repo client > > test_must_fail git -C client cat-file -p "$HASH" > > git -C client config core.repositoryformatversion 1 > > git -C client config extensions.lazyobject \ > > "\"$TEST_DIRECTORY/t0410/lazy-object\" \"$(pwd)/server/.git\"" > > git -C client cat-file -p "$HASH" > > > > with fsck still working. Also, there is no need for a list of promised > > blobs, and the long-running process protocol is being used. > > I do not think I read your response to my last comment on this > series, so I could be missing something large, but I am afraid that > the resulting fsck is only half as useful as the normal fsck. I do > not see it any better than a hypothetical castrated version of fsck > that only checks the integrity of objects that appear in the local > object store, without doing any connectivity checks. Sorry, I haven't replied to your last response [1]. That does sound like a good idea, though, and probably can be extended to trees and blobs in that we need to make sure that any object referenced from local-only commits (calculated as you describe in [1]) can be obtained through an object walk from a remote-tracking branch. I haven't fully thought of the implications of things like building commits on top of an arbitrary upstream commit (so since our upstream commit is not a tip, the object walk from all remote-tracking branches might not reach our upstream commit). To try to solve that, we could use an alternate object store to store remote objects in order to be able to find remote objects quickly without doing a traversal, but that does not fully solve the problem, because some information about remote object possession lies only in their parents (for example, if we don't have a remote blob, sometimes the only way to know that the remote has it is by having a tree containing that blob). In addition, this also couples the lazy object loading with either a remote ref (or all remote refs, if we decide to consider objects from all remote refs as potentially loadable). I'll think about this further. [1] https://public-inbox.org/git/xmqq379fkz4x.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/ > Don't get me wrong. The integrity check on local objects you still > do is important---that is what allows us to make sure that the local > "cache" does not prevent us from going to the real source of the > remote object store by having a corrupt copy. > > But not being able to tell if a missing object is OK to be missing > (because we can get them if/as needed from elsewhere) or we lost the > sole copy of an object that we created and have not pushed out > (hence we are in deep yogurt) makes it pretty much pointless to run > "fsck", doesn't it? It does not give us any guarantee that our > repository plus perfect network connectivity gives us an environment > to build further work on. > > Or am I missing something fundamental? Well, the fsck can still detect issues like corrupt objects (as you mention above) and dangling heads, which might be real issues. But it is true that it does not give you the guarantee you describe. >From a user standpoint, this might be able to be worked around by providing a network-requiring object connectivity checking tool or by just having the user running a build to ensure that all necessary files are present. Having said that, this feature will be very nice to have.