On 2023-01-13 at 13:23:59, Hans Petter Selasky wrote: > Hi, > > Currently GIT only supports cryptographic hashes for its commit tags. > > That means: > > 1) It's very difficult to edit the history without also recomputing the hash > tags for all commits after the needed change-point, which then means > references to a repository is broken. This is intentional. Commit and tag signing requires an unbroken Merkle tree-like construction that prevents the history from being modified by signing a single commit or tag. > 2) Only a single bit error in the main repository can break everything! git fsck is designed to detect this, and by default it's run every time the repository is repacked (such as by git gc). But yes, this is a problem, and changing to an algorithm which isn't cryptographically secure won't change that. Prudent users back up data to prevent data loss. > 3) Illicit contents may be present in binary blobs, which in the future may > be need to be removed without warrant and the only way to do that is by > rebasing and force pushing, which will break "everything". It can be > everything from child-porn to expired distribution licenses. This is a problem in every Merkle tree-like system. Most repositories have some sort of code review or access control that prevents people from generally pushing inappropriate content. For example, if somebody proposed to push any sort of pornography or other inappropriate content (e.g., a racist screed) to one of my repositories or one of my employer's, I'd refuse to approve or merge such a change, because that wouldn't be appropriate for the repository. I don't feel this is enough of a problem that using a Merkle tree-like construction is a bad idea, given the benefits it offers. > Therefore I propose the following changes to GIT. > > 1) Use a CRC128 / 256 or 512 non-cryptographic based hashing algorithm as > default. As the person who wrote the SHA-256 support, I'm pleased to report that adding a new hash algorithm isn't very difficult anymore. The largest part of the work is updating all the tests. I've tried very hard to make this substantially easier for everyone. However, Git is moving in the direction of stronger cryptographic algorithms, rather than insecure hashing algorithms. I don't think your proposal is a good idea, nor do I think it's likely to be adopted. If it were adopted, the signing of commits and tags would be meaningless, and because it would be trivial to create collisions[0], there would clearly be some pairs of objects which could not be stored. This would make Git much less useful, and it might allow users to attempt to forge or replace content without being detected. That being said, you are free to create your own fork of the code which does so, provided you comply with the terms of the license. > 2) Add support for a CRC fixup field, which usually is zero, but when merges > are needed, it can be non-zero, to allow the hash-tag-value to remain the > same! This also allows for easy conversion of existing GIT repositories to > the new scheme. For the same reason as above, I don't think this is a good idea. > 3) All git objects should be uncompressed. This would dramatically increase the size of most repositories. I've easily seen repositories where the uncompressed contents exceed 1 TB in size yet the repository is only double-digit gigabytes, if that. Most people will find the increase in disk usage unacceptable, and I'm certain that includes Git hosterse. [0] CRC is linear and the following relations apply, which makes forgery trivial (see https://en.wikipedia.org/wiki/Cyclic_redundancy_check): CRC(x XOR y) = CRC(x) XOR CRC(y) XOR c for some c CRC(x XOR y XOR z) = CRC(x) XOR CRC(y) XOR CRC(z) -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature