Re: [PATCH] diff-delta: produce optimal pack data

Carl Baldwin <cnb@xxxxxxxxx> · Fri, 24 Feb 2006 12:23:54 -0700

On Fri, Feb 24, 2006 at 01:57:20PM -0500, Nicolas Pitre wrote:
> On Fri, 24 Feb 2006, Carl Baldwin wrote:
> 
> > My version is 1.2.1.  I have not been following your work.  When was
> > pack data reuse introduced?
> 
> Try out version 1.2.3.

I'm on it now.

> > From where can I obtain your delta patches?
> 
> Forget them for now -- they won't help you.

Ah, I have been looking at your patches and clearly they will not help.

> > There is really no opportunity for pack-data reuse in this case.  The
> > repository had never been packed or cloned in the first place.  As I
> > said, I do not intend to pack these binary files at all since there is
> > no benefit in this case.
> 
> Yes there is, as long as you have version 1.2.3.  The clone logic will 
> simply reuse already packed data without attempting to recompute it.

I meant that there is no benefit in disk space usage.  Packing may
actually increase my disk space usage in this case.  Refer to what I
said about experimentally running gzip and xdelta on the files
independantly of git.

I see what you're saying about this data reuse helping to speed up
subsequent cloning operations.  However, if packing takes this long and
doesn't give me any disk space savings then I don't want to pay the very
heavy price of packing these files even the first time nor do I want to
pay the price incrementally.

The most I would tolerate for the first pack is a few seconds.  The most
I would tolerate for any incremental pack is about 1 second.

BTW, git repack has been going for 30 minutes and has packed 4/36
objects.  :-)

> I think you really should try git version 1.2.3 with a packed 
> repository.  It might handle your special case just fine.

No, not when I'm not willing to pay the price to pack even once.  This
isn't a case where I have one such repository and 'once its been packed
then its packed'.  This is only one example of such a repository.  I am
looking for a process for revisioning this type of data that will be
used over and over.  Git may not be the answer here but it sure is
looking good in many other ways.

I think the right answer would be for git to avoid trying to pack files
like this.  Junio mentioned something like this in his message.

Thanks for your input.

Cheers,
Carl

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        RADCAD (R&D CAD)
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@xxxxxx
 Fort Collins, CO 80525              home: Carl@xxxxxxxxxxxxx
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html