Re: [PATCH v3 5/5] doc: add technical design doc for large object promisors

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 27 Jan 2025 10:02:18 -0800

Christian Couder <christian.couder@xxxxxxxxx> writes:

>> > +In other words, the goal of this document is not to talk about all the
>> > +possible ways to optimize how Git could handle large blobs, but to
>> > +describe how a LOP based solution could work well and alleviate a
>> > +number of current issues in the context of Git clients and servers
>> > +sharing Git objects.
>>
>> But if you do not discuss even a single way, and handwave "we'll
>> have this magical object storage that would solve all the problems
>> for us", then we cannot really tell if the problem is solved by us,
>> or by handwaved away by assuming the magical object storage.
>> We'd need at least one working example.
>
> It's not magical object storage. Amazon S3, GCP Bucket and MinIO
> (which is open source), for example, already exist and are used a lot
> in the industry.

That's just "we can store bunch of bytes and ask them to be
retrieved".  What I said about handwaving the presence of magical
"object storage" is exactly the "optimize how to handle large blobs"
part.  I agree that we do not need to discuss _ALL_ the possible
ways.  But without telling what our thoughts on _how_ to use these
"lower cost and safe by duplication but with high latency" services
to store our objects efficiently enough to make it practical, I'd
have to call what we see in the document "magical object storage".

>> > +7) A client can offload to a LOP
>> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > +
>> > +When a client is using a LOP that is also a LOP of its main remote,
>> > +the client should be able to offload some large blobs it has fetched,
>> > +but might not need anymore, to the LOP.
>>
>> For a client that _creates_ a large object, the situation would be
>> the same, right?  After it creates several versions of the opening
>> segment of, say, a movie, the latest version may be still wanted,
>> but the creating client may want to offload earlier versions.
>
> Yeah, but it's not clear if the versions of the opening segment should
> be sent directly to the LOP without the main remote checking them in
> some ways (hooks might be configured only on the main remote) and/or
> checking that they are connected to the repo. I guess it depends on
> the context if it would be OK or not.

If it is not clear to us or whoever writes this document, the users
would have a hard time to make effective use of it, which is why I
am worried about the current design in this feature.

Thanks for clarifying other parts of my confusion.