On Fri, 7 Sep 2018 at 22:24, Ben Peart <peartben@xxxxxxxxx> wrote: > > Ben Peart <benpeart@xxxxxxxxxxxxx> writes: > >> - 160-bit SHA-1 over the extension types and their sizes (but not > >> their contents). E.g. if we have "TREE" extension that is N-bytes > >> long, "REUC" extension that is M-bytes long, followed by "EOIE", > >> then the hash would be: > The purpose of the SHA isn't to detect disk corruption (we already have > a SHA for the entire index that can serve that purpose) but to help > ensure that this was actually a valid EOIE extension and not a lucky > random set of bytes. [...] > >> +#define EOIE_SIZE 24 /* <4-byte offset> + <20-byte hash> */ > >> + the_hash_algo->init_fn(&c); > >> + while (src_offset < mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) { [...] > >> + the_hash_algo->final_fn(hash, &c); > >> + if (hashcmp(hash, (unsigned char *)index)) > >> + return 0; > >> + > >> + /* Validate that the extension offsets returned us back to the eoie extension. */ > >> + if (src_offset != mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) > >> + return 0; Besides the issue you and Junio discussed with "should we document this as being SHA-1 or NewHash" (or "the hash algo"), it seems to me that this implementation is living somewhere between using SHA-1 and "the hash algo". The hashing uses `the_hash_algo`, but the hash size is hardcoded at 20 bytes. Maybe it all works out, e.g., so that when someone (brian) merges a NewHash and runs the testsuite, this will fail consistently and in a safe way. But I wonder if it would be too hard to avoid the hardcoded 24 already now. Martin