Re: [PATCH v5 25/40] external-odb: add 'get_direct' support

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Thu, 14 Sep 2017 11:19:45 -0700

On Thu, 14 Sep 2017 10:39:35 +0200
Christian Couder <christian.couder@xxxxxxxxx> wrote:

> From the following email:
> 
> https://public-inbox.org/git/20170804145113.5ceafafa@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
> 
> it looks like his work is fundamentally about changing the rules of
> connectivity checks. Objects are split between "homegrown" objects and
> "imported" objects which are in separate pack files. Then references
> to imported objects are not checked during connectivity check.
> 
> I think changing connectivity rules is not necessary to make something
> like external odb work. For example when fetching a pack that refers
> to objects that are in an external odb, if access this external odb
> has been configured, then the connectivity check will pass as the
> missing objects in the pack will be seen as already part of the repo.

There are still some nuances. For example, if an external ODB provides
both a tree and a blob that the tree references, do we fetch the tree in
order to call "have" on all its blobs, or do we trust the ODB that if it
has the tree, it has all the other objects? In my design, I do the
latter, but in the general case where we have multiple ODBs, we might
have to do the former. (And if we do the former, it seems to me that the
connectivity check must be performed "online" - that is, with the ODBs
being able to provide "get".)

(Also, my work extends all the way to fetch/clone [1], but admittedly I
have been taking it a step at a time and recently have only been
discussing how the local repo should handle the missing object
situation.)

[1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@xxxxxxxxxx/

> Yeah, if some commands like fsck are used, then possibly all the
> objects will have to be requested from the external odb, as it may not
> be possible to fully check all the objects, especially the blobs,
> without accessing all their data. But I think this is a problem that
> could be dealt with in different ways. For example we could develop
> specific options in fsck so that it doesn't check the sha1 of objects
> that are marked with some specific attributes, or that are stored in
> external odbs, or that are bigger than some size.

The hard part is in dealing with missing commits and trees, I think, not
blobs.