Re: [RFC PATCH 2/2] hash API: add and use a hash_short_id_by_algo() function

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 19 May 2022 21:07:51 +0200

On Thu, May 19 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>
>> Add and use a hash_short_id_by_algo() function. As noted in the
>> comment being modified here (added in [1]) the intention wasn't to
>> have these end up in on-disk formats, but since [2], [3] and [3]
>
> double [3]?

*nod*, sorry.

>> that's been the case, and there's an outstanding patch to add another
>> format that uses these[5].
>>
>> So let's expose this functionality as a documented utility function,
>> instead of copy/pasting this code in various places.
>>
>> Replacing the die() in the existing functions with a BUG() might be
>> overzelous, it's correct for the case of
>> e.g. write_commit_graph_file() and write_midx_header(), but we also
>> use this for parsing on-disk files, e.g. in parse_commit_graph().
>
> If we know the offending data can come from outside, not from
> literals in the code, then there is no "might be", such a use of
> BUG() is simply wrong.

Fair enough, we could change it to have 1/2 of this (the writing) use
BUG(), and die() for the other part, or just leave it as die() for both.

>> We could add a "gently" version of this, but for now I think that
>> worrying about the distinction would be worrying too much. If we ever
>> end up parsing such files that'll almost certainly be a bug in our own
>> writing code, so the distinction would be rather academic, even though
>
> It would be a file written by an ancient buggy version of our code,
> or a buggy third-party reimplementation of Git.  It could be that a
> new version of Git is using a yet-to-be-invented algorithm this
> version of Git does not know about.
>
> The distinction matters in that "The version of Git I downloaded and
> built last week out of the latest release tag said BUG" should mean
> only one thing: that version that reports BUG() is the culprit, not
> some random other thing we do not even know where it came from that
> left a corrupt data on disk.

Makes sense.