Re: [RFC/PATCH v4 00/49] Add initial experimental external ODB support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 20 Jun 2017 09:54:34 +0200
Christian Couder <christian.couder@xxxxxxxxx> wrote:

> Git can store its objects only in the form of loose objects in
> separate files or packed objects in a pack file.
> 
> To be able to better handle some kind of objects, for example big
> blobs, it would be nice if Git could store its objects in other object
> databases (ODB).

Thanks for this, and sorry for the late reply. It's good to know that
others are thinking about "missing" objects in repos too.

>   - "have": the helper should respond with the sha1, size and type of
>     all the objects the external ODB contains, one object per line.

This should work well if we are not caching this "have" information
locally (that is, if the object store can be accessed with low latency),
but I am not sure if this will work otherwise. I see that you have
proposed a local cache-using method later in the e-mail - my comments on
that are below.

>   - "get <sha1>": the helper should then read from the external ODB
>     the content of the object corresponding to <sha1> and pass it to
> Git.

This makes sense - I have some patches [1] that implement this with the
"fault_in" mechanism described in your e-mail.

[1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@xxxxxxxxxx/

> * Transfering information
> 
> To tranfer information about the blobs stored in external ODB, some
> special refs, called "odb ref", similar as replace refs, are used in
> the tests of this series, but in general nothing forces the helper to
> use that mechanism.
> 
> The external odb helper is responsible for using and creating the refs
> in refs/odbs/<odbname>/, if it wants to do that. It is free for
> example to just create one ref, as it is also free to create many
> refs. Git would just transmit the refs that have been created by this
> helper, if Git is asked to do so.
> 
> For now in the tests there is one odb ref per blob, as it is simple
> and as it is similar to what git-lfs does. Each ref name is
> refs/odbs/<odbname>/<sha1> where <sha1> is the sha1 of the blob stored
> in the external odb named <odbname>.
> 
> These odb refs point to a blob that is stored in the Git
> repository and contain information about the blob stored in the
> external odb. This information can be specific to the external odb.
> The repos can then share this information using commands like:
> 
> `git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`
> 
> At the end of the current patch series, "git clone" is teached a
> "--initial-refspec" option, that asks it to first fetch some specified
> refs. This is used in the tests to fetch the odb refs first.
> 
> This way only one "git clone" command can setup a repo using the
> external ODB mechanism as long as the right helper is installed on the
> machine and as long as the following options are used:
> 
>   - "--initial-refspec <odbrefspec>" to fetch the odb refspec
>   - "-c odb.<odbname>.command=<helper>" to configure the helper

A method like this means that information about every object is
downloaded, regardless of which branches were actually cloned, and
regardless of what parameters (e.g. max blob size) were used to control
the objects that were actually cloned.

We could make, say, one "odb ref" per size and branch - for example,
"refs/odbs/master/0", "refs/odbs/master/1k", "refs/odbs/master/1m", etc.
- and have the client know which one to download. But this wouldn't
scale if we introduce different object filters in the clone and fetch
commands.

I think that it is best to have upload-pack send this information
together with the packfile, since it knows exactly what objects were
omitted, and therefore what information the client needs. As discussed
in a sibling e-mail, clone/fetch already needs to be modified to omit
objects anyway.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux