Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt

Ben Peart <peartben@xxxxxxxxx> · Wed, 30 Aug 2017 08:50:52 -0400

On 8/29/2017 11:43 AM, Christian Couder wrote:
On Mon, Aug 28, 2017 at 8:59 PM, Ben Peart <peartben@xxxxxxxxx> wrote:

On 8/3/2017 5:19 AM, Christian Couder wrote:

+Helpers
+=======
+
+ODB helpers are commands that have to be registered using either the
+"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
+config variables.
+
+Registering such a command tells Git that an external odb called
+<odbname> exists and that the registered command should be used to
+communicate with it.

What order are the odb handlers called? Are they called before or after the
regular object store code for loose, pack and alternates?  Is the order
configurable?

For get_*_object instructions the regular code is called before the odb helpers.
(So the odb helper code is at the end of stat_sha1_file() and of
open_sha1_file() in sha1_file.c.)

For put_*_object instructions the regular code is called after the odb helpers.
(So the odb helper code is at the beginning of write_sha1_file() in
sha1_file.c.)

And no this order is not configurable, but of course it could be made
configurable.

+ - 'get_direct <sha1>'
+
+This instruction is similar as the other 'get_*' instructions except
+that no object should be sent from the helper to Git. Instead the
+helper should directly write the requested object into a loose object
+file in the ".git/objects" directory.
+
+After the helper has sent the "status=success" packet and the
+following flush packet in process mode, or after it has exited in the
+script mode, Git should lookup again for a loose object file with the
+requested sha1.

When will git call get_direct vs one of the other get_* functions?

It is called just before exiting when git cannot find an object.
It is not exactly at the same place as other get_* instructions as I
tried to reuse your code and as it looks like it makes it easier to
retry the regular code after the odb helper code.

Could the
functionality of enabling a helper to populate objects into the regular
object store be provided by having a ODB helper that returned the object
data as requested by get_git_obj or get_raw_obj but also stored it in the
regular object store as a loose object (or pack file) for future calls?

I am not sure I understand what you mean.
If a helper returns the object data as requested by get_git_obj or
get_raw_obj, then currently Git will itself store the object locally
in its regular object store, so it is redundant for the helper to also
store or try to store the object in the regular object store.

Doesn't this mean that objects will "leak out" into the regular object 
store as they are used?  For example, at checkout, all objects in the 
requested commit would be retrieved from the various object stores and 
if they came from a "large blob" ODB handler, they would get retrieved 
from the ODB handler and then written to the regular object store 
(presumably as a loose object).  From then on, the object would be 
retrieved from the regular object store.

This would seem to defeat the goal of enabling specialized object 
handlers to handle large or other "unusual" objects that git normally 
doesn't deal well with.