Re: [RFC PATCH 2/2] hash API: add and use a hash_short_id_by_algo() function

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 19 May 2022 08:50:18 -0700

Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

> Add and use a hash_short_id_by_algo() function. As noted in the
> comment being modified here (added in [1]) the intention wasn't to
> have these end up in on-disk formats, but since [2], [3] and [3]

double [3]?

> that's been the case, and there's an outstanding patch to add another
> format that uses these[5].
>
> So let's expose this functionality as a documented utility function,
> instead of copy/pasting this code in various places.
>
> Replacing the die() in the existing functions with a BUG() might be
> overzelous, it's correct for the case of
> e.g. write_commit_graph_file() and write_midx_header(), but we also
> use this for parsing on-disk files, e.g. in parse_commit_graph().

If we know the offending data can come from outside, not from
literals in the code, then there is no "might be", such a use of
BUG() is simply wrong.

> We could add a "gently" version of this, but for now I think that
> worrying about the distinction would be worrying too much. If we ever
> end up parsing such files that'll almost certainly be a bug in our own
> writing code, so the distinction would be rather academic, even though

It would be a file written by an ancient buggy version of our code,
or a buggy third-party reimplementation of Git.  It could be that a
new version of Git is using a yet-to-be-invented algorithm this
version of Git does not know about.

The distinction matters in that "The version of Git I downloaded and
built last week out of the latest release tag said BUG" should mean
only one thing: that version that reports BUG() is the culprit, not
some random other thing we do not even know where it came from that
left a corrupt data on disk.