On Thu, Aug 3, 2017 at 11:40 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Christian Couder <christian.couder@xxxxxxxxx> writes: > >> This implements the 'get_direct' capability/instruction that makes >> it possible for external odb helper scripts to pass blobs to Git >> by directly writing them as loose objects files. > > I am not sure if the assumption is made clear in this series, but I > am (perhaps incorrectly) guessing that it is assumed that the > intended use of this feature is to offload access to large blobs > by not including them in the initial clone. Yeah, it could be used for that, but that's not the only interesting use case. It could also be used for example if the working tree contains a huge number of blobs and it is better to download only the blobs that are needed when they are needed. In fact the code for 'get_direct' was taken from Ben Peart's "read-object" patch series (actually from an earlier version of this patch series): https://public-inbox.org/git/20170714132651.170708-1-benpeart@xxxxxxxxxxxxx/ > So from that point of > view, I think it makes tons of sense to let the external helper to > directly populate the database bypassing Git (i.e. instead of > feeding data stream and have Git store it) like this "direct" method > does. > > How does this compare with (and how well does this work with) what > Jonathan Tan is doing recently? >From the following email: https://public-inbox.org/git/20170804145113.5ceafafa@xxxxxxxxxxxxxxxxxxxxxxxxxxx/ it looks like his work is fundamentally about changing the rules of connectivity checks. Objects are split between "homegrown" objects and "imported" objects which are in separate pack files. Then references to imported objects are not checked during connectivity check. I think changing connectivity rules is not necessary to make something like external odb work. For example when fetching a pack that refers to objects that are in an external odb, if access this external odb has been configured, then the connectivity check will pass as the missing objects in the pack will be seen as already part of the repo. Yeah, if some commands like fsck are used, then possibly all the objects will have to be requested from the external odb, as it may not be possible to fully check all the objects, especially the blobs, without accessing all their data. But I think this is a problem that could be dealt with in different ways. For example we could develop specific options in fsck so that it doesn't check the sha1 of objects that are marked with some specific attributes, or that are stored in external odbs, or that are bigger than some size.