On Mon, Mar 19, 2018 at 5:43 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > >> +static inline void oe_set_size(struct object_entry *e, >> + unsigned long size) >> +{ >> + e->size_ = size; >> + e->size_valid = e->size_ == size; > > A quite similar comment as my earlier one applies here. I wonder if > this is easier to read? > > e->size_valid = fits_in_32bits(size); > if (e->size_valid) > e->size_ = size; > > Stepping back a bit in a different tangent, > > - fits_in_32bits() is a good public name if the helper is about > seeing if the given quantity fits in 32bit uint, > > - but that carves it in stone that our e->size_ *will* be 32bit > forever, which is not good. > > So, it may be a good idea to call it size_cacheable_in_oe(size) or > something to ask "I have this 'size'; is it small enough to fit in > the field in the oe, i.e. allow us to cache it, as opposed to having > to go back to the object every time?" Of course, this would declare > that the helper can only be used for that particular field, but that > is sort of the point of such a change, to allow us to later define > the e->size_ field to different sizes without affecting other stuff. This is why I do "size_valid = size_ == size". In my private build, I reduced size_ to less than 32 bits and change the "fits_in_32bits" function to do something like int fits_in_32bits(unsigned long size) { struct object_entry e; e.size_ = size; return e.size_ == size. } which makes sure it always works. This spreads the use of "valid = xx == yy" in more places though. I think if we just limit the use of this expression in a couple access wrappers than it's not so bad. >> + if (!e->size_valid) { >> + unsigned long real_size; >> + >> + if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 || >> + size != real_size) >> + die("BUG: 'size' is supposed to be the object size!"); >> + } > > If an object that is smaller than 4GB is fed to this function with > an incorrect size, we happily record it in e->size_ and declare it > is valid. Wouldn't that be equally grave error as we are catching > in this block? That adds an extra sha1_object_info() to all objects and it's expensive (I think it's one of the reasons we cache values in object_entry in the first place). I think there are also a few occasions we reuse even bad in-pack objects (there are even tests for that) so it's not always safe to die() here. -- Duy