[PATCH v2 0/1] Documentation/user-manual.txt: try to clarify on object hashes

Dirk Gouders <dirk@xxxxxxxxxxx> · Tue, 12 Mar 2024 11:41:55 +0100



This is the second round of adding a hashing example to user-manual.txt.
---
Changes in v2:
- Do not go into detail about hashing in the history.
- Change code according to coding guidelines.
- Fix a typo (s/asume/assume/) and change the wording of that sentence.
- Write Git instead of `git`.
- To fit the whole document, change sample content to "Hello world", lentgh 12.
- Add verification of hash using `git hash-object`.
- Provide for empty lines around code blocks.
---
Dirk Gouders (1):
  Documentation/user-manual.txt: example for generating object hashes

 Documentation/user-manual.txt | 36 +++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

Range-diff against v1:
1:  6995f866e7 ! 1:  568c59d69f Documentation/user-manual.txt: example for generating object hashes
    @@ Metadata
      ## Commit message ##
         Documentation/user-manual.txt: example for generating object hashes
     
    -    If someone spends the time to work through the documentation, the
    -    subject "hashes" can lead to contradictions:
    +    Add a simple example on how object hashes can be generated manually.
     
    -    The README of the initial commit states hashes are generated from
    -    compressed data (which changed very soon), whereas
    -    Documentation/user-manual.txt says they are generated from original
    -    data.
    -
    -    Don't give doubts a chance: clarify this and present a simple example
    -    on how object hashes can be generated manually.
    +    Further, because the document suggests to have a look at the initial
    +    commit, clarify that some details changed since that time.
     
         Signed-off-by: Dirk Gouders <dirk@xxxxxxxxxxx>
     
      ## Documentation/user-manual.txt ##
    -@@ Documentation/user-manual.txt: that is used to name the object is the hash of the original data
    +@@ Documentation/user-manual.txt: that not only specifies their type, but also provides size information
    + about the data in the object.  It's worth noting that the SHA-1 hash
    + that is used to name the object is the hash of the original data
      plus this header, so `sha1sum` 'file' does not match the object name
    - for 'file'.
    - 
    -+Starting with the initial commit, hashing was done on the compressed
    -+data and the file README of that commit explicitely states this:
    -+
    -+"The SHA1 hash is always the hash of the _compressed_ object, not the
    -+original one."
    +-for 'file'.
    ++for 'file' (the earliest versions of Git hashed slightly differently
    ++but the conclusion is still the same).
     +
    -+This changed soon after that with commit
    -+d98b46f8d9a3 (Do SHA1 hash _before_ compression.).  Unfortunately, the
    -+commit message doesn't provide the detailed reasoning.
    ++The following is a short example that demonstrates how these hashes
    ++can be generated manually:
     +
    -+The following is a short example that demonstrates how hashes can be
    -+generated manually:
    ++Let's assume a small text file with some simple content:
     +
    -+Let's asume a small text file with the content "Hello git.\n"
     +-------------------------------------------------
    -+$ cat > hello.txt <<EOF
    -+Hello git.
    -+EOF
    ++$ echo "Hello world" >hello.txt
     +-------------------------------------------------
     +
    -+We can now manually generate the hash `git` would use for this file:
    ++We can now manually generate the hash Git would use for this file:
     +
     +- The object we want the hash for is of type "blob" and its size is
    -+  11 bytes.
    ++  12 bytes.
     +
     +- Prepend the object header to the file content and feed this to
    -+  sha1sum(1):
    ++  `sha1sum`:
     +
     +-------------------------------------------------
    -+$ printf "blob 11\0" | cat - hello.txt | sha1sum
    -+7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 .
    ++$ { printf "blob 12\0"; cat hello.txt; } | sha1sum
    ++802992c4220de19a90767f3000a79a31b98d0df7  -
     +-------------------------------------------------
     +
    ++This manually constructed hash can be verified using `git hash-object`
    ++which of course hides the addition of the header:
    ++
    ++-------------------------------------------------
    ++$ git hash-object hello.txt
    ++802992c4220de19a90767f3000a79a31b98d0df7
    ++-------------------------------------------------
    + 
      As a result, the general consistency of an object can always be tested
      independently of the contents or the type of the object: all objects can
    - be validated by verifying that (a) their hashes match the content of the
    +@@ Documentation/user-manual.txt: $ git switch --detach e83c5163
    + ----------------------------------------------------
    + 
    + The initial revision lays the foundation for almost everything Git has
    +-today, but is small enough to read in one sitting.
    ++today (even though details may differ in a few places), but is small
    ++enough to read in one sitting.
    + 
    + Note that terminology has changed since that revision.  For example, the
    + README in that revision uses the word "changeset" to describe what we
-- 
2.43.0