Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 19 Mar 2018 11:29:12 -0700

Duy Nguyen <pclouds@xxxxxxxxx> writes:

> This is why I do "size_valid = size_ == size". In my private build, I
> reduced size_ to less than 32 bits and change the "fits_in_32bits"
> function to do something like
>
> int fits_in_32bits(unsigned long size)
> {
> struct object_entry e;
> e.size_ = size;
> return e.size_ == size.
> }
>
> which makes sure it always works. This spreads the use of "valid = xx
> == yy"  in more places though. I think if we just limit the use of
> this expression in a couple access wrappers than it's not so bad.

Yes, but then we should name the helper so that it is clear that it
is not about 32-bit but is about the width of e.size_ field.
>
>>> +     if (!e->size_valid) {
>>> +             unsigned long real_size;
>>> +
>>> +             if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
>>> +                 size != real_size)
>>> +                     die("BUG: 'size' is supposed to be the object size!");
>>> +     }
>>
>> If an object that is smaller than 4GB is fed to this function with
>> an incorrect size, we happily record it in e->size_ and declare it
>> is valid.  Wouldn't that be equally grave error as we are catching
>> in this block?
>
> That adds an extra sha1_object_info() to all objects and it's
> expensive (I think it's one of the reasons we cache values in
> object_entry in the first place). I think there are also a few
> occasions we reuse even bad in-pack objects (there are even tests for
> that) so it's not always safe to die() here.

So what?  My point is that I do not see the point in checking if the
size is correct on only one side (i.e. size is too big to fit in
e->size_) and not the other.  If it is worth checking (perhaps under
"#ifndef NDEBUG" or some other debug option?) then I'd think we
should spend cycles for all objects and check.