Duy Nguyen <pclouds@xxxxxxxxx> writes: > This is why I do "size_valid = size_ == size". In my private build, I > reduced size_ to less than 32 bits and change the "fits_in_32bits" > function to do something like > > int fits_in_32bits(unsigned long size) > { > struct object_entry e; > e.size_ = size; > return e.size_ == size. > } > > which makes sure it always works. This spreads the use of "valid = xx > == yy" in more places though. I think if we just limit the use of > this expression in a couple access wrappers than it's not so bad. Yes, but then we should name the helper so that it is clear that it is not about 32-bit but is about the width of e.size_ field. > >>> + if (!e->size_valid) { >>> + unsigned long real_size; >>> + >>> + if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 || >>> + size != real_size) >>> + die("BUG: 'size' is supposed to be the object size!"); >>> + } >> >> If an object that is smaller than 4GB is fed to this function with >> an incorrect size, we happily record it in e->size_ and declare it >> is valid. Wouldn't that be equally grave error as we are catching >> in this block? > > That adds an extra sha1_object_info() to all objects and it's > expensive (I think it's one of the reasons we cache values in > object_entry in the first place). I think there are also a few > occasions we reuse even bad in-pack objects (there are even tests for > that) so it's not always safe to die() here. So what? My point is that I do not see the point in checking if the size is correct on only one side (i.e. size is too big to fit in e->size_) and not the other. If it is worth checking (perhaps under "#ifndef NDEBUG" or some other debug option?) then I'd think we should spend cycles for all objects and check.