On 8/3/2017 5:19 AM, Christian Couder wrote:
This describes the external odb mechanism's purpose and how it works. Helped-by: Ben Peart <benpeart@xxxxxxxxxxxxx> Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx> --- Documentation/technical/external-odb.txt | 295 +++++++++++++++++++++++++++++++ 1 file changed, 295 insertions(+) create mode 100644 Documentation/technical/external-odb.txt diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt new file mode 100644 index 0000000000..5991221fd5 --- /dev/null +++ b/Documentation/technical/external-odb.txt @@ -0,0 +1,295 @@ +External ODBs +^^^^^^^^^^^^^ + +The External ODB mechanism makes it possible for Git objects, mostly +blobs for now though, to be stored in an "external object database" +(External ODB). + +An External ODB can be any object store as long as there is an helper +program called an "odb helper" that can communicate with Git to +transfer objects to/from the external odb and to retrieve information +about available objects in the external odb. + +Purpose +======= + +The purpose of this mechanism is to make possible to handle Git +objects, especially blobs, in much more flexible ways. + +Currently Git can store its objects only in the form of loose objects +in separate files or packed objects in a pack file. + +This is not flexible enough for some important use cases like handling +really big binary files or handling a really big number of files that +are fetched only as needed. And it is not realistic to expect that Git +could fully natively handle many of such use cases. + +Furthermore many improvements that are dependent on specific setups +could be implemented in the way Git objects are managed if it was +possible to customize how the Git objects are handled. For example a +restartable clone using the bundle mechanism has often been requested, +but implementing that would go against the current strict rules under +which the Git objects are currently handled. + +What Git needs a mechanism to make it possible to customize in a lot +of different ways how the Git objects are handled. Though this +mechanism should try as much as possible to avoid interfering with the +usual way in which Git handle its objects. + +Helpers +======= + +ODB helpers are commands that have to be registered using either the +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand" +config variables. + +Registering such a command tells Git that an external odb called +<odbname> exists and that the registered command should be used to +communicate with it. +
What order are the odb handlers called? Are they called before or after the regular object store code for loose, pack and alternates? Is the order configurable?
[...]
+ + - 'get_direct <sha1>' + +This instruction is similar as the other 'get_*' instructions except +that no object should be sent from the helper to Git. Instead the +helper should directly write the requested object into a loose object +file in the ".git/objects" directory. + +After the helper has sent the "status=success" packet and the +following flush packet in process mode, or after it has exited in the +script mode, Git should lookup again for a loose object file with the +requested sha1.
When will git call get_direct vs one of the other get_* functions? Could the functionality of enabling a helper to populate objects into the regular object store be provided by having a ODB helper that returned the object data as requested by get_git_obj or get_raw_obj but also stored it in the regular object store as a loose object (or pack file) for future calls?