Re: [RFC/PATCH v3 00/16] Add initial experimental external ODB support

Christian Couder <christian.couder@xxxxxxxxx> · Thu, 15 Dec 2016 10:56:12 +0100

On Tue, Dec 13, 2016 at 9:05 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
>> In general I think that having a lot of refs is really a big problem
>> right now in Git as many big organizations using Git are facing this
>> problem in one form or another.
>> So I think that support for a big number of refs is a separate and
>> important problem that should and hopefully will be solved.
>
> But you do not have to make it worse.
>
> Is "refs" a good match for the problem you are solving?  Or is it
> merely an expedient thing to use?  I think it is the latter, judging
> by your mentioning RefTree.  Whatever mechanism we choose, that will
> be carved into stone in users' repositories and you'd end up having
> to support it, and devise the migration path out of it if the initial
> selection is too problematic.
>
> That is why people (not just me) pointed out upfront that using refs
> for this purose would not scale.

What I should perhaps have clarified in my previous answer, and also
in the documentation of the patch series, is that in what I have done
and what I propose, the external odb helper is responsible for using
and creating the refs in refs/odbs/<odbname>/.

So this helper is free to just create one ref, as it is also free to
create many refs. Git is just transmitting the refs that have been
created by this helper.

Right now people are already free to use whatever external script or
software to create whatever refs/stuff/* they want, pointing to
whatever objects they want, and have Git transmit that. And indeed I
know that it is already a problem out there, as then people often get
into trouble related to having many refs. But it is a different
problem that is not going to be solved anyway in this patch series.

So if some people want to use a specific external odb, it's their
responsibility to use an helper that will not create too many refs.
If they know that they just need their external odb to handle around
10 big files, why wouldn't they use a simple helper that creates one
odb ref per big file/blob?

On the contrary if they know that they will need to handle thousands
of big files, then, yeah, they should find or implement a helper that
will, as I suggested in my previous email, just create one ref
in refs/odbs/<odbname>/ that points to a blob that contains a list
(maybe a json list with information attached to each item) of the
blobs stored in the external odb.

For testing purposes in what I have done in the patch series, I use
only simple helpers that create one odb ref per big file/blob. So yes,
it gives a bad example, because, if people just copy this design while
they need the e-odb to handle a big number of files, then they will be
in trouble. But this does not by itself carve anything into stone.

One thing that could help is perhaps to put big warnings into the
simple helpers saying "Be careful!!! This will not scale if you want
to handle more than a small number of large files!!! You'd better use
an helper that does <this and that> if you want to handle many large
files!!! You have been warned!!!".

So I am reluctant at this point to write a complex helper just for the
purpose of showing a good example to people who want to use e-odb to
store a big number of files, as these people anyway would probably
need something like Lars' "filter process protocol" too.