Re: [PATCH] git-pack-objects: cache small deltas between big objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 20, 2007 at 09:54:53PM -0700, Junio C Hamano wrote:
> Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> writes:
> > diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> > index d165f10..13429d0 100644
> > --- a/builtin-pack-objects.c
> > +++ b/builtin-pack-objects.c
> > ...
> > @@ -1294,10 +1302,17 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
> >  	if (!delta_buf)
> >  		return 0;
> >  
> > +	if (trg_entry->delta_data)
> > +		free (trg_entry->delta_data);
> > +	trg_entry->delta_data = 0;
> >  	trg_entry->delta = src_entry;
> >  	trg_entry->delta_size = delta_size;
> >  	trg_entry->depth = src_entry->depth + 1;
> > -	free(delta_buf);
> > +	/* cache delta, if objects are large enough compared to delta size */
> > +	if ((src_size >> 20) + (trg_size >> 21) > (delta_size >> 10))
> > +		trg_entry->delta_data = delta_buf;
> > +	else
> > +		free(delta_buf);
> >  	return 1;
> >  }
> 
> Care to justify this arithmetic?  Why isn't it for example like
> this?
> 
> 	((src_size + trg_size) >> 10) > delta_size

I wanted to avoid a possible overflow in (src_size + trg_size), so
I shift both sides.

> I am puzzled by the shifts on both ends, and differences between
> 20 and 21.

I base the maximum allowed delta_size for caching on the required
memory for creating the delta. For the src entry, you need need a
delta index, which has (about) the same size of the src entry. So I
count the src entry double.

I divide the requried memory by 1024, so that the delta size is some
magnitudes smaller and will not cause a big increase of memory usage,
eg:

For two 100 MB (uncompressed) blobs, we need 300MB of memory to do the
delta (with the default window size of 10 up to 1900MB for all delta
indexes in the worst case). The patch will limit the delta size for
the target blob to 150kB.

The caching policy does only cache really small deltas for really big
objects, as I wanted to avoid out of memory situations. Futurer patch
should probably replace it with a better strategy.

mfg Martin Kögler
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux