Re: [PATCH v5 00/40] Add initial experimental external ODB support

Christian Couder <christian.couder@xxxxxxxxx> · Thu, 14 Sep 2017 09:02:54 +0200

On Sun, Sep 10, 2017 at 2:30 PM, Lars Schneider
<larsxschneider@xxxxxxxxx> wrote:
>
>> On 03 Aug 2017, at 10:18, Christian Couder <christian.couder@xxxxxxxxx> wrote:
>>
>> ...
>>
>> * The "helpers" (registered commands)
>>
>> Each helper manages access to one external ODB.
>>
>> There are 2 different modes for helper:
>>
>>  - Helpers configured using "odb.<odbname>.scriptCommand" are
>>    launched each time Git wants to communicate with the <odbname>
>>    external ODB. This is called "script mode".
>>
>>  - Helpers configured using "odb.<odbname>.subprocessCommand" are
>>    launched launched once as a sub-process (using sub-process.h), and
>>    Git communicates with them using packet lines. This is called
>>    "process mode".
>
> I am curious, why would we support two modes? Wouldn't that increase
> the maintenance cost? Wouldn't the subprocess command be superior?
> I imagine the script mode eases testing, right?!

The script mode makes it much easier to write some helpers. For
example, as shown in t0430 at the end of the patch series, a helper
for a restartable bundle based clone could be something like
basically:

case "$1" in
init)
    ref_hash=$(git rev-parse refs/odbs/magic/bundle) ||
    die "couldn't find refs/odbs/magic/bundle"
    GIT_NO_EXTERNAL_ODB=1 git cat-file blob "$ref_hash" >bundle_info ||
    die "couldn't get blob $ref_hash"
    bundle_url=$(sed -e 's/bundle url: //' bundle_info)
    curl "$bundle_url" -o bundle_file ||
    die "curl '$bundle_url' failed"
    GIT_NO_EXTERNAL_ODB=1 git bundle unbundle bundle_file >unbundling_info ||
    die "unbundling 'bundle_file' failed"
    ;;

>> These odb refs point to a blob that is stored in the Git
>> repository and contain information about the blob stored in the
>> external odb. This information can be specific to the external odb.
>> The repos can then share this information using commands like:
>>
>> `git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`
>>
>> At the end of the current patch series, "git clone" is teached a
>> "--initial-refspec" option, that asks it to first fetch some specified
>> refs. This is used in the tests to fetch the odb refs first.
>>
>> This way only one "git clone" command can setup a repo using the
>> external ODB mechanism as long as the right helper is installed on the
>> machine and as long as the following options are used:
>>
>>  - "--initial-refspec <odbrefspec>" to fetch the odb refspec
>>  - "-c odb.<odbname>.command=<helper>" to configure the helper
>
> The "odb" config could, of course, go into the global git config.

Sure.

> The odbrefspec is optional, right?

Using "--initial-refspec <odbrefspec>" is optional. There will be more
information in the documentation about this option in the next version
of the series.

> I have the impression there are a number of topics on the list
> that tackle the "many/big objects in a Git repo" problem. Is
> there a write up about the status of them, how they relate
> to each other, and what the current problems are?
> I found the following but it looks abandoned:
> https://github.com/jrn/git-large-repositories

Yeah, it could be interesting to discuss all these topics together. On
the other hand people working on existing patch series, like me, have
to work on them and post new versions, as just discussing the topics
is not enough to move things forward.
Anyway Junio and Jonathan Tan also asked me questions about how my
work relates to Jonathan's, so I will reply to them hopefully soon...