Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry

Duy Nguyen <pclouds@xxxxxxxxx> · Mon, 19 Mar 2018 17:54:48 +0100

On Mon, Mar 19, 2018 at 5:43 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:
>
>> +static inline void oe_set_size(struct object_entry *e,
>> +                            unsigned long size)
>> +{
>> +     e->size_ = size;
>> +     e->size_valid = e->size_ == size;
>
> A quite similar comment as my earlier one applies here.  I wonder if
> this is easier to read?
>
>         e->size_valid = fits_in_32bits(size);
>         if (e->size_valid)
>                 e->size_ = size;
>
> Stepping back a bit in a different tangent,
>
>  - fits_in_32bits() is a good public name if the helper is about
>    seeing if the given quantity fits in 32bit uint,
>
>  - but that carves it in stone that our e->size_ *will* be 32bit
>    forever, which is not good.
>
> So, it may be a good idea to call it size_cacheable_in_oe(size) or
> something to ask "I have this 'size'; is it small enough to fit in
> the field in the oe, i.e. allow us to cache it, as opposed to having
> to go back to the object every time?"  Of course, this would declare
> that the helper can only be used for that particular field, but that
> is sort of the point of such a change, to allow us to later define
> the e->size_ field to different sizes without affecting other stuff.

This is why I do "size_valid = size_ == size". In my private build, I
reduced size_ to less than 32 bits and change the "fits_in_32bits"
function to do something like

int fits_in_32bits(unsigned long size)
{
struct object_entry e;
e.size_ = size;
return e.size_ == size.
}

which makes sure it always works. This spreads the use of "valid = xx
== yy"  in more places though. I think if we just limit the use of
this expression in a couple access wrappers than it's not so bad.

>> +     if (!e->size_valid) {
>> +             unsigned long real_size;
>> +
>> +             if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
>> +                 size != real_size)
>> +                     die("BUG: 'size' is supposed to be the object size!");
>> +     }
>
> If an object that is smaller than 4GB is fed to this function with
> an incorrect size, we happily record it in e->size_ and declare it
> is valid.  Wouldn't that be equally grave error as we are catching
> in this block?

That adds an extra sha1_object_info() to all objects and it's
expensive (I think it's one of the reasons we cache values in
object_entry in the first place). I think there are also a few
occasions we reuse even bad in-pack objects (there are even tests for
that) so it's not always safe to die() here.
-- 
Duy