If someone spends the time to work through the documentation, the subject "hashes" can lead to contradictions: The README of the initial commit states hashes are generated from compressed data (which changed very soon), whereas Documentation/user-manual.txt says they are generated from original data. Don't give doubts a chance: clarify this and present a simple example on how object hashes can be generated manually. Signed-off-by: Dirk Gouders <dirk@xxxxxxxxxxx> --- Documentation/user-manual.txt | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt index 6433903491..8dfb81e045 100644 --- a/Documentation/user-manual.txt +++ b/Documentation/user-manual.txt @@ -4095,6 +4095,39 @@ that is used to name the object is the hash of the original data plus this header, so `sha1sum` 'file' does not match the object name for 'file'. +Starting with the initial commit, hashing was done on the compressed +data and the file README of that commit explicitely states this: + +"The SHA1 hash is always the hash of the _compressed_ object, not the +original one." + +This changed soon after that with commit +d98b46f8d9a3 (Do SHA1 hash _before_ compression.). Unfortunately, the +commit message doesn't provide the detailed reasoning. + +The following is a short example that demonstrates how hashes can be +generated manually: + +Let's asume a small text file with the content "Hello git.\n" +------------------------------------------------- +$ cat > hello.txt <<EOF +Hello git. +EOF +------------------------------------------------- + +We can now manually generate the hash `git` would use for this file: + +- The object we want the hash for is of type "blob" and its size is + 11 bytes. + +- Prepend the object header to the file content and feed this to + sha1sum(1): + +------------------------------------------------- +$ printf "blob 11\0" | cat - hello.txt | sha1sum +7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 . +------------------------------------------------- + As a result, the general consistency of an object can always be tested independently of the contents or the type of the object: all objects can be validated by verifying that (a) their hashes match the content of the -- 2.43.0