Re: [PATCH v2 5/5] Revert "repository: pre-initialize hash algo pointer"

Duy Nguyen <pclouds@xxxxxxxxx> · Sun, 25 Feb 2018 10:29:29 +0700

On Sun, Feb 25, 2018 at 5:58 AM, brian m. carlson
<sandals@xxxxxxxxxxxxxxxxxxxx> wrote:
>> diff --git a/repository.c b/repository.c
>> index 4ffbe9bc94..0d715f4fdb 100644
>> --- a/repository.c
>> +++ b/repository.c
>> @@ -5,7 +5,7 @@
>>
>>  /* The main repository */
>>  static struct repository the_repo = {
>> -     NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, &the_index, &hash_algos[GIT_HASH_SHA1], 0, 0
>> +     NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, &the_index, NULL, 0, 0
>
> I'm wondering, now that you have the name field for the unknown value,
> if that might be a better choice here than NULL.  I don't have a strong
> preference either way, so whatever you decide here is fine.

I did try that first, but for the purpose of catching uninitialized
algo use, NULL is better.

When I set unknown hash algo here, I think some test failed
mysteriously because it used rawsz field (which has value zero), it
didn't match some expectation, the code went to the error handling
path, which eventually failed with some error message, but it's not
obvious that the problem was rawsz being zero and back tracking that
took me some time.

With NULL hash_algo, any dereferencing fails immediately with a nice
stack trace. Another reason to push me towards NULL hash algo is, even
if we prefer nice messages over segmentation faults, we can't avoid it
completely anyway (empty_tree and empty_blob are still NULL in unknown
hash algo and will cause segfaults). Might as well make things
consistent and always segfault.
-- 
Duy