Hi,
Currently GIT only supports cryptographic hashes for its commit tags.
That means:
1) It's very difficult to edit the history without also recomputing the
hash tags for all commits after the needed change-point, which then
means references to a repository is broken.
2) Only a single bit error in the main repository can break everything!
3) Illicit contents may be present in binary blobs, which in the future
may be need to be removed without warrant and the only way to do that is
by rebasing and force pushing, which will break "everything". It can be
everything from child-porn to expired distribution licenses.
Many people think that bit errors cannot happen because the memory uses
ECC and the file system uses cryptographic hashes to verify the
integrity of the data. But what many people forget about is that when
copying data from memory to disk, typically using a DMA channel data is
copied w/o any kind of integrity protection, because the integrity
protection is not end-to-end. The integrity protection is only per-link.
Therefore I propose the following changes to GIT.
1) Use a CRC128 / 256 or 512 non-cryptographic based hashing algorithm
as default.
2) Add support for a CRC fixup field, which usually is zero, but when
merges are needed, it can be non-zero, to allow the hash-tag-value to
remain the same! This also allows for easy conversion of existing GIT
repositories to the new scheme.
3) All git objects should be uncompressed.
CRC-XXX can easily be used to correct multiple bit errors without any
performance overhead.
Please CC me. I'm not subscribed to this list.
--HPS