On Sat, 16 May 2020 at 22:47, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > > On 2020-05-16 at 11:18:12, Martin Ågren wrote: > > On Wed, 13 May 2020 at 02:56, brian m. carlson > > <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > git index-pack is usually run in a repository, but need not be. Since > > > packs don't contains information on the algorithm in use, instead > > > relying on context, add an option to index-pack to tell it which one > > > we're using in case someone runs it outside of a repository. > > > Similar to an earlier patch where we modify `the_hash_algo` like this, I > > feel a bit nervous. What happens if you pass in a "wrong" algo here, > > i.e., SHA-1 in a SHA-256 repo? Or, given the motivation in the commit > > message, should this only be allowed if we really *are* outside a repo? > > Unfortunately, we can't prevent the user from being inside repository A, > which is SHA-1, while invoking git index-pack on repository B, which is > SHA-256. Ah, I see. > That is valid without --stdin, if uncommon, and it needs to be > supported. I can prevent it from being used with --stdin, though. Hmm, that might make sense. I suppose it could quickly get out of control with bug reports coming in along the lines of "if I do this really crazy git index-pack invocation, I manage to mess things up". The easiest way to address this might be through documentation, i.e., "don't use this option", "for internal use" or even "to be used by the test suite only" for which there is even precedence in git-index-pack(1). On the other hand, if we need to detect such hash mismatch even once the SHA-256 work is 100% complete, then I suppose we really should try a bit to catch bad invocations. As a tangent, I see that v2.27.0 will come with `git init --object-format=<format>` and `GIT_DEFAULT_HASH_ALGORITHM`. The docs for the former mentions "(if enabled)". Should we add something more scary to those to make it clear that they shouldn't be used and that you basically shouldn't even try to figure out how to enable them? I can already see the tweets and blog posts a few weeks from now about how you can build Git from source setting a single switch, run git init --object-format=sha256 and you're in the future! Which will just lead to pain some days or weeks later.... "I've done lots of work. How do I convert my repo to SHA-1 so I can share it?"... We've added "experimental" things before and tried to document the experimental nature. Maybe here we're not even "experimental" -- more like "if you use this in production, you *will* suffer"? > If you pass in a wrong algorithm, we usually blow up with an inflate > error because we consume more bytes than expected with our ref deltas. > I'm not aware of any cases where we segfault or access invalid memory; > we just blow up in a nonobvious way. That's true, too, if you manually > tamper with the algorithm in extensions.objectformat; usually we blow up > (but not segfault) because the index is "corrupt". Ok, I see. I suppose "some time", we could tweak error messages to hint about an object-format mismatch, but I don't think that needs to block your work here now. Martin