Re: [PATCH] diff-delta: produce optimal pack data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 24 Feb 2006, Carl Baldwin wrote:

> I see what you're saying about this data reuse helping to speed up
> subsequent cloning operations.  However, if packing takes this long and
> doesn't give me any disk space savings then I don't want to pay the very
> heavy price of packing these files even the first time nor do I want to
> pay the price incrementally.

Of course.  There is admitedly a problem here.  I'm just abusing a bit 
of your time to properly identify its parameters.

> The most I would tolerate for the first pack is a few seconds.  The most
> I would tolerate for any incremental pack is about 1 second.

Well that is probably a bit tight.  Ideally it should be linear with the 
size of the data set to process.  If you have 10 files 10MB each it 
should take about the same time to pack than 10000 files of 10KB each.  
Of course incrementally packing one additional 10MB file might take more 
than a second although it is only one file.
 
> BTW, git repack has been going for 30 minutes and has packed 4/36
> objects.  :-)

Pathetic.

> I think the right answer would be for git to avoid trying to pack files
> like this.  Junio mentioned something like this in his message.

Yes.  First we could add an additional parameter to the repacking 
strategy which is the undeltified but deflated size of an object.  That 
would prevent any deltas to become bigger than the simply deflated 
version.

Remains the delta performance issue.  I think I know what the problem 
is.  I'm not sure I know what the best solution would be though.  The 
patological data set is easy to identify quickly and one strategy might 
simply to bail out early when it happens and therefore not attempt any 
delta.

However, if you could let me play with two samples of your big file I'd 
be grateful.  If so I'd like to make git work well with your data set 
too which is not that uncommon after all.


Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]