Re: [PATCH v2] bulk-checkin: only support blobs in index_bulk_checkin

"Eric W. Biederman" <ebiederm@xxxxxxxxx> · Wed, 20 Sep 2023 07:24:49 -0500

Junio C Hamano <gitster@xxxxxxxxx> writes:

> "Eric W. Biederman" <ebiederm@xxxxxxxxx> writes:
>
>> As far as I can tell this extra pass defeats most of the purpose of
>> streaming, and it is much easier to implement with in memory buffers.
>
> The purpose of streaming being the ability to hash and compute the
> object name without having to hold the entirety of the object, I am
> not sure the above is a good argument.  You can run multiple passes
> by streaming the same data twice if you needed to, and how much
> easier the implementation may become if you can assume that you can
> hold everything in-core, what you cannot fit in-core would not fit
> in-core, so ...

Yes this wording needs to be clarified.

If streaming to handle objects that don't fit in memory is the purpose,
I agree there are slow multi-pass ways to deal with trees, commits and
tags.

If writing directly to the pack is the purpose, using an in-core
buffer for trees, commits, and tags is better.

I will put on the wording on the back burner and see what I come up
with.

>> So if it is needed to write commits, trees, and tags directly to pack
>> files writing a separate function to do the would be needed.
>
> But I am OK with this conclusion.  As the way to compute the
> fallback hashes for different types of objects are very different,
> compared to a single-hash world where as long as you come up with a
> serialization you have only a single way to hash and name the
> object.  We would end up having separate helper functions per target
> type anyway, even if we kept a single entry point function like
> index_stream().  The single entry point function will only be used
> to just dispatch to type specific ones, so renaming what we have today
> and making it clear they are for "blobs" does make sense.

Good.  I am glad I am able to step back and successfully explain the
whys of things.

Eric