Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt

Christian Couder <christian.couder@xxxxxxxxx> · Wed, 30 Aug 2017 16:15:04 +0200

On Wed, Aug 30, 2017 at 2:50 PM, Ben Peart <peartben@xxxxxxxxx> wrote:
>
>
> On 8/29/2017 11:43 AM, Christian Couder wrote:
>>
>> On Mon, Aug 28, 2017 at 8:59 PM, Ben Peart <peartben@xxxxxxxxx> wrote:
>>>
>>>
>>> On 8/3/2017 5:19 AM, Christian Couder wrote:
>>>>
>>>>
>>>> +Helpers
>>>> +=======
>>>> +
>>>> +ODB helpers are commands that have to be registered using either the
>>>> +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
>>>> +config variables.
>>>> +
>>>> +Registering such a command tells Git that an external odb called
>>>> +<odbname> exists and that the registered command should be used to
>>>> +communicate with it.
>>>
>>>
>>> What order are the odb handlers called? Are they called before or after
>>> the
>>> regular object store code for loose, pack and alternates?  Is the order
>>> configurable?
>>
>>
>> For get_*_object instructions the regular code is called before the odb
>> helpers.
>> (So the odb helper code is at the end of stat_sha1_file() and of
>> open_sha1_file() in sha1_file.c.)
>>
>> For put_*_object instructions the regular code is called after the odb
>> helpers.
>> (So the odb helper code is at the beginning of write_sha1_file() in
>> sha1_file.c.)
>>
>> And no this order is not configurable, but of course it could be made
>> configurable.
>>
>>>> + - 'get_direct <sha1>'
>>>> +
>>>> +This instruction is similar as the other 'get_*' instructions except
>>>> +that no object should be sent from the helper to Git. Instead the
>>>> +helper should directly write the requested object into a loose object
>>>> +file in the ".git/objects" directory.
>>>> +
>>>> +After the helper has sent the "status=success" packet and the
>>>> +following flush packet in process mode, or after it has exited in the
>>>> +script mode, Git should lookup again for a loose object file with the
>>>> +requested sha1.
>>>
>>>
>>> When will git call get_direct vs one of the other get_* functions?
>>
>>
>> It is called just before exiting when git cannot find an object.
>> It is not exactly at the same place as other get_* instructions as I
>> tried to reuse your code and as it looks like it makes it easier to
>> retry the regular code after the odb helper code.
>>
>>> Could the
>>> functionality of enabling a helper to populate objects into the regular
>>> object store be provided by having a ODB helper that returned the object
>>> data as requested by get_git_obj or get_raw_obj but also stored it in the
>>> regular object store as a loose object (or pack file) for future calls?
>>
>>
>> I am not sure I understand what you mean.
>> If a helper returns the object data as requested by get_git_obj or
>> get_raw_obj, then currently Git will itself store the object locally
>> in its regular object store, so it is redundant for the helper to also
>> store or try to store the object in the regular object store.
>>
>
> Doesn't this mean that objects will "leak out" into the regular object store
> as they are used?  For example, at checkout, all objects in the requested
> commit would be retrieved from the various object stores and if they came
> from a "large blob" ODB handler, they would get retrieved from the ODB
> handler and then written to the regular object store (presumably as a loose
> object).  From then on, the object would be retrieved from the regular
> object store.
>
> This would seem to defeat the goal of enabling specialized object handlers
> to handle large or other "unusual" objects that git normally doesn't deal
> well with.

Yeah, I agree that storing the objects in the regular object store
should not be done in all the cases.
There should be a way to control that.