On Thu, May 19 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > >> Add and use a hash_short_id_by_algo() function. As noted in the >> comment being modified here (added in [1]) the intention wasn't to >> have these end up in on-disk formats, but since [2], [3] and [3] > > double [3]? *nod*, sorry. >> that's been the case, and there's an outstanding patch to add another >> format that uses these[5]. >> >> So let's expose this functionality as a documented utility function, >> instead of copy/pasting this code in various places. >> >> Replacing the die() in the existing functions with a BUG() might be >> overzelous, it's correct for the case of >> e.g. write_commit_graph_file() and write_midx_header(), but we also >> use this for parsing on-disk files, e.g. in parse_commit_graph(). > > If we know the offending data can come from outside, not from > literals in the code, then there is no "might be", such a use of > BUG() is simply wrong. Fair enough, we could change it to have 1/2 of this (the writing) use BUG(), and die() for the other part, or just leave it as die() for both. >> We could add a "gently" version of this, but for now I think that >> worrying about the distinction would be worrying too much. If we ever >> end up parsing such files that'll almost certainly be a bug in our own >> writing code, so the distinction would be rather academic, even though > > It would be a file written by an ancient buggy version of our code, > or a buggy third-party reimplementation of Git. It could be that a > new version of Git is using a yet-to-be-invented algorithm this > version of Git does not know about. > > The distinction matters in that "The version of Git I downloaded and > built last week out of the latest release tag said BUG" should mean > only one thing: that version that reports BUG() is the culprit, not > some random other thing we do not even know where it came from that > left a corrupt data on disk. Makes sense.