On 2024-09-01 at 23:45:43, Patrick Steinhardt wrote: > Unfortunately, this once again uncovers a deeper issue: neither the > packfile nor their index encode the object format they use. So while > falling back to SHA1 papers over the issue, it means that we misparse > SHA256 indices. Also, we misparse SHA1 indices if we happen to be in a > SHA256 repository. E.g. when parsing a SHA256 file in a SHA1 repo: > > $ git index-pack --verify '/tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.pack' > error: wrong index v2 file size in /tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.idx > fatal: Cannot open existing pack idx file for '/tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.idx' > > The error message isn't even properly indicating what the actual issue > is. Yes, this is also true of other formats like the index as well, but there we know it must be of the same format as the rest of the repository. I noticed this during writing the SHA-256 series, and it's inconvenient. If you blame some of the tests that add the `--object-format` entry, I wrote them. > One potential solution would be to try and derive the object format from > the hash that the packfile index name has. But that is quite roundabout > and rather ugly, and packfiles may not necessarily have that hash in the > first place. It would also become potentially ambiguous in the future if > we were to ever adopt another hash that has the same length as either > SHA1 or SHA256. Yes, we've decided not to derive things by their length except in the dumb HTTP protocol for this reason. > So we basically have three different options: > > - Accept that we just don't handle this case correctly and let the > code error out. This pessimizes all hashes but SHA256. > > - Bail out when outside of a repository when `--object-format=` wasn't > given. This pessimizes all hashes, but gives a clear indicator to > the user why things don't work. This is what I would recommend. > - Introduce packfiles v3 and encode the object format into the header. > Then do either (1) or (2) on top. I think we have pack v3 already (which is the same as v2), and v4 was for an experimental format that never landed fully. Maybe v5? If you wanted to do this, you could add support for arbitrary chunks, like with multi-pack indexes, that would allow for extensibility in the future. However, you'd also need some protocol capabilities if you want to send pack v5 or certain chunks over the protocol. > The last option is of course the cleanest, but also the most involved. I'd personally recommend just requiring the `--object-format=` option, but of course if you want to write pack v5, don't let me stop you. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature