Re: [PATCHv4 09/10] pack-objects: Estimate pack size; abort early if pack size limit is exceeded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 23. May 2011, Shawn Pearce wrote:
> We can still get a tighter estimate if we wanted to. I wouldn't mix
> it into this patch, but make a new one on top of it. During delta
> compression we hold onto deltas, or at least compute and retain the
> size of the chosen delta. We could re-check the pack size after the
> Compressing phase by including the delta sizes in the estimate, and
> if we are over, abort before writing.

Ok. Not sure when I'll have the time/courage to dive into this, but I'll 
at least give it a try.

> For non-delta, non-reuse we may be able to guess by just using the
> loose object size. The loose object is most likely compressed at the
> same compression ratio as the outgoing pack stream will use, so a
> deflate(inflate(loose)) cycle is going to be very close in total
> bytes used. If we over shoot the limit by more than some fudge
> factor (say 8K in 1M limit or 0.7%), abort before writing.

I already have an unsubmitted patch on top of the series that includes 
the on-disk/compressed size of loose objects in the estimate. However, 
it's quite intrusive (need to extend sha1_object_info() to return 
compressed size of loose objects). Also, since I don't yet take the 
delta compression into account, these numbers are obviously unreliable.

That said, in the cases where loose objects are not deltified it seems 
the compressed/loose versions are about 3 to 7 bytes larger than the 
corresponding compressed/packed versions. I guess this is due to the 
loose files using a "<type> SP <size> NUL" text header (deflated), 
whereas the pack uses a more compact binary format (not deflated).

We could test a large corpus (e.g. linux-kernel) to find the average 
difference between compressed/loose size and compressed/packed size, and 
then multiply this with the number of non-delta, non-reuse object to 
determine the fudge factor you describe above.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]