On 5/25/2022 3:53 AM, Jonathan Nieder wrote: > Taylor Blau wrote: >> On Tue, May 24, 2022 at 11:55:02PM +0200, Ævar Arnfjörð Bjarmason wrote: > >>>> Moreover, I can't seem to find any formats that _don't_ use that >>>> convention. >>> >>> It's used in the reftable format. The use in reftable is the only one I can find and that implementation is not idiomatic. Specifically, the way the four-byte header was implemented is not easy to extract and share in other formats. This series does the good work of extracting oid_version() as a common method across these formats so it is easier to share. > It's also used in the formats described in > Documentation/technical/hash-function-transition. It documents things that have not been implemented, such as the v3 pack-index format: Pack index (.idx) files use a new v3 format that supports multiple hash functions. They have the following format (all integers are in network byte order): (...) * 4-byte number of object formats in this pack index: 2 * For each object format: ** 4-byte format identifier (e.g., 'sha1' for SHA-1) ** 4-byte length in bytes of shortened object names. This is the shortest possible length needed to make names in the shortened object name table unambiguous. ** 4-byte integer, recording where tables relating to this format are stored in this index file, as an offset from the beginning. This was added in your 752414ae431 (technical doc: add a design doc for hash function transition, 2017-09-27), but has not been acted upon yet. > [...] >> Sounds good. Unless others have a very strong opinion, let's leave it as >> is. > > File formats are one of those things where a little time early can save > a lot of work later. If there were a strong reason to use "1" and "2" > here then I'd be okay with living with it --- I'm a pragmatic person. > But in general, using the magic numbers instead of a sequential value is > really helpful both in making the file formats more self-explanatory and > in making it possible to experiment with multiple new hash_algos at the > same time. > > The main argument I'm hearing for using "1" and "2" is "because some > other formats got that wrong". That reason is the opposite of > compelling to me: it makes me suspect that as a project we should more > eagerly break the old bad habits and form new ones. I guess this > qualifies as a very strong opinion. Either way, these are magic numbers. One happens to somewhat spell out something when looking at the file in a hex editor with ASCII previews, but that doesn't change the fact that it is most important that the hash function is correctly indicated by the file format and parsed by the Git executable (not a human). I'd much rather have a consistent and proven way of specifying the hash value (using the oid_version() helper) than to try and make a new mechanism. Thanks, -Stolee