Re: [PATCH v5 5/5] apply: fix uninitialized hash function

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 21 May 2024 06:36:31 -0700

Patrick Steinhardt <ps@xxxxxx> writes:

>> +	if (!the_hash_algo)
>> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
>> +
>
> Do we also want to add a comment here that mentions that we may want to
> make this configureable via a command line option, like we have in the
> preceding commits?

We may want a comment here that says 

    we could to redo the "apply.c" machinery to make this arbitrary
    fallback unnecessary, but the benefit to do so is dubious and
    the risk of breaking the code is probably not worth the effort.

When working as "better GNU patch" mode without a repository, we
should not and do not use the_hash_algo for the purpose of hashing
at all.  We do not do the binary diff (because we cannot grab the
preimage object out of the object store (that does not exist) to
apply the delta to form the postimage, we do not do the 3-way
fallback using the preimage blob object names that appear on the
"index" lines.  As far as I recall, the only thing we use
the_hash_algo for is for the max length of the hash to ask "is this
hexadecimal string a plausible looking object name?  We are parsing
a line that started with 'index ' and trying to see if the line
syntactically looks like a valid 'index' line" and the like.  If we
assume SHA-1 and if somebody tries to injest a SHA-256 --full-index
patch, that logic may say "nah, we didn't find a valid 'index'
header", but I think we'll just leave the fields like old-object
blank, which does not affect anything because we won't do "apply
--3way" anyway.

So we _could_ identify such places and tell the code "when
the_hash_algo is NULL, instead of the_hash_algo->hexsz, use 0 as
hexsz" (this example is from apply.c:gitdiff_index()) and it indeed
was the approach I tried in my unpublished draft before the first
version I posted.  It quickly got ugly.