Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt

Ben Peart <peartben@xxxxxxxxx> · Mon, 28 Aug 2017 14:59:55 -0400

On 8/3/2017 5:19 AM, Christian Couder wrote:
This describes the external odb mechanism's purpose and
how it works.

Helped-by: Ben Peart <benpeart@xxxxxxxxxxxxx>
Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
---
  Documentation/technical/external-odb.txt | 295 +++++++++++++++++++++++++++++++
  1 file changed, 295 insertions(+)
  create mode 100644 Documentation/technical/external-odb.txt

diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt
new file mode 100644
index 0000000000..5991221fd5
--- /dev/null
+++ b/Documentation/technical/external-odb.txt
@@ -0,0 +1,295 @@
+External ODBs
+^^^^^^^^^^^^^
+
+The External ODB mechanism makes it possible for Git objects, mostly
+blobs for now though, to be stored in an "external object database"
+(External ODB).
+
+An External ODB can be any object store as long as there is an helper
+program called an "odb helper" that can communicate with Git to
+transfer objects to/from the external odb and to retrieve information
+about available objects in the external odb.
+
+Purpose
+=======
+
+The purpose of this mechanism is to make possible to handle Git
+objects, especially blobs, in much more flexible ways.
+
+Currently Git can store its objects only in the form of loose objects
+in separate files or packed objects in a pack file.
+
+This is not flexible enough for some important use cases like handling
+really big binary files or handling a really big number of files that
+are fetched only as needed. And it is not realistic to expect that Git
+could fully natively handle many of such use cases.
+
+Furthermore many improvements that are dependent on specific setups
+could be implemented in the way Git objects are managed if it was
+possible to customize how the Git objects are handled. For example a
+restartable clone using the bundle mechanism has often been requested,
+but implementing that would go against the current strict rules under
+which the Git objects are currently handled.
+
+What Git needs a mechanism to make it possible to customize in a lot
+of different ways how the Git objects are handled. Though this
+mechanism should try as much as possible to avoid interfering with the
+usual way in which Git handle its objects.
+
+Helpers
+=======
+
+ODB helpers are commands that have to be registered using either the
+"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
+config variables.
+
+Registering such a command tells Git that an external odb called
+<odbname> exists and that the registered command should be used to
+communicate with it.
+

What order are the odb handlers called? Are they called before or after 
the regular object store code for loose, pack and alternates?  Is the 
order configurable?

[...]
+
+ - 'get_direct <sha1>'
+
+This instruction is similar as the other 'get_*' instructions except
+that no object should be sent from the helper to Git. Instead the
+helper should directly write the requested object into a loose object
+file in the ".git/objects" directory.
+
+After the helper has sent the "status=success" packet and the
+following flush packet in process mode, or after it has exited in the
+script mode, Git should lookup again for a loose object file with the
+requested sha1.

When will git call get_direct vs one of the other get_* functions? Could 
the functionality of enabling a helper to populate objects into the 
regular object store be provided by having a ODB helper that returned 
the object data as requested by get_git_obj or get_raw_obj but also 
stored it in the regular object store as a loose object (or pack file) 
for future calls?