On Fri, 14 Aug 2020 at 14:37, Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 8/14/2020 8:21 AM, Martin Ågren wrote: > > We say that value 1 means "SHA-1", but in fact, it means "whatever > > the_hash_algo is", see commit c166599862 ("commit-graph: convert to > > using the_hash_algo", 2018-11-14). > > > > Signed-off-by: Martin Ågren <martin.agren@xxxxxxxxx> > > --- > > If we want to be more fine-grained in the future, we'll need to say, > > e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the > > version number. > > > > I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It > > could be implemented as "easily" as "if (value_from_header != > > value_from_the_hash_algo) die(...);" for now. Might that pay off in the > > long run? > > > > This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're > > fine. > > I think that was the intention of the byte, but that is not what ended > up happening. When I wrote this patch, I did go with "fix <thing>" rather than "document SHA-256". For the other patches, it's sort of obvious how those formats are so old that it's no wonder they assumed SHA-1. But here, we did go to some trouble to try and future-proof things and already had NewHash in mind. So that calls for "fix <thing>". But I'm more and more starting to think that it's the implementation that should be fixed and that the docs should just be extended to add "2 means SHA-256". > If we want that to be the case, then we should do that > work as part of the 2.29 cycle before we release with the ability to > create SHA-256 repos (which will lock the commit-graph format for these > repos). Part of my reasoning behind [3] was that in exactly a situation like this, we'd be able to say With extensions.objectFormat=sha256, Git 2.29-2.30 will barf at the commit-graph files that Git 2.31+ generate, and the other way around. Users will be able to remove "old" files and regenerate them, and shouldn't use a mixed environment. and know that those users knew this might happen. But certainly, if we can avoid it altogether by changing behavior already in 2.29, that would be better. [3] https://lore.kernel.org/git/20200806202358.2265705-1-martin.agren@xxxxxxxxx/ > (By "we" I mean that I would try to do this work in a way that minimizes > conflicts with the current commit-graph work in flight [1] [2].) None of those seems to touch `oid_version()`, so if we can just tweak that function to return 1 or 2 instead of always 1, I guess that's one way. > [1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@xxxxxxxxx/ > > [2] https://lore.kernel.org/git/cover.1597178914.git.me@xxxxxxxxxxxx/ > > > > > - 1-byte Hash Version (1 = SHA-1) > > - We infer the hash length (H) from this value. > > + 1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo) > > + We infer the hash length (H) from the hash algo derived from this value. > > If we are _not_ changing the format to have a meaningful value in > this byte, then this documentation should be updated to state that > this byte must always have value 1, as it does not provide any > information. We could still go 1 means whatever extensions.objectFormat says, 2 means SHA-1, 3 means SHA-256, ... But maybe that would just be crazy. Thanks for all your comments Martin