On 9/8/2018 2:29 AM, Martin Ågren wrote:
On Fri, 7 Sep 2018 at 22:24, Ben Peart <peartben@xxxxxxxxx> wrote:
Ben Peart <benpeart@xxxxxxxxxxxxx> writes:
- 160-bit SHA-1 over the extension types and their sizes (but not
their contents). E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:
The purpose of the SHA isn't to detect disk corruption (we already have
a SHA for the entire index that can serve that purpose) but to help
ensure that this was actually a valid EOIE extension and not a lucky
random set of bytes. [...]
+#define EOIE_SIZE 24 /* <4-byte offset> + <20-byte hash> */
+ the_hash_algo->init_fn(&c);
+ while (src_offset < mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) {
[...]
+ the_hash_algo->final_fn(hash, &c);
+ if (hashcmp(hash, (unsigned char *)index))
+ return 0;
+
+ /* Validate that the extension offsets returned us back to the eoie extension. */
+ if (src_offset != mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER)
+ return 0;
Besides the issue you and Junio discussed with "should we document this
as being SHA-1 or NewHash" (or "the hash algo"), it seems to me that
this implementation is living somewhere between using SHA-1 and "the
hash algo". The hashing uses `the_hash_algo`, but the hash size is
hardcoded at 20 bytes.
Maybe it all works out, e.g., so that when someone (brian) merges a
NewHash and runs the testsuite, this will fail consistently and in a
safe way. But I wonder if it would be too hard to avoid the hardcoded 24
already now.
Martin
I can certainly change this to be:
#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ)
which should (hopefully) make it easier to find this hard coded hash
length in the sea of hard coded "20" and "160" (bits) littered through
the codebase.