Re: [PATCH v3 2/4] eoie: add End of Index Entry (EOIE) extension

Ben Peart <peartben@xxxxxxxxx> · Sat, 8 Sep 2018 10:03:59 -0400

On 9/8/2018 2:29 AM, Martin Ågren wrote:
On Fri, 7 Sep 2018 at 22:24, Ben Peart <peartben@xxxxxxxxx> wrote:
Ben Peart <benpeart@xxxxxxxxxxxxx> writes:

- 160-bit SHA-1 over the extension types and their sizes (but not
their contents).  E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:

The purpose of the SHA isn't to detect disk corruption (we already have
a SHA for the entire index that can serve that purpose) but to help
ensure that this was actually a valid EOIE extension and not a lucky
random set of bytes. [...]

+#define EOIE_SIZE 24 /* <4-byte offset> + <20-byte hash> */

+    the_hash_algo->init_fn(&c);
+    while (src_offset < mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) {
[...]
+    the_hash_algo->final_fn(hash, &c);
+    if (hashcmp(hash, (unsigned char *)index))
+            return 0;
+
+    /* Validate that the extension offsets returned us back to the eoie extension. */
+    if (src_offset != mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER)
+            return 0;

Besides the issue you and Junio discussed with "should we document this
as being SHA-1 or NewHash" (or "the hash algo"), it seems to me that
this implementation is living somewhere between using SHA-1 and "the
hash algo". The hashing uses `the_hash_algo`, but the hash size is
hardcoded at 20 bytes.

Maybe it all works out, e.g., so that when someone (brian) merges a
NewHash and runs the testsuite, this will fail consistently and in a
safe way. But I wonder if it would be too hard to avoid the hardcoded 24
already now.

Martin

I can certainly change this to be:

#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ)

which should (hopefully) make it easier to find this hard coded hash 
length in the sea of hard coded "20" and "160" (bits) littered through 
the codebase.