Re: Preferring shallower deltas on repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jul 08, 2007 at 10:31:43PM -0700, Junio C Hamano wrote:
> Putting aside a potential argument that the way the file in
> question, version.lisp-expr, is kept track of might be insane,
> this is an interesting topic.

Yeah, that version numbering system worked quite well for CVS, given its
lack of any other kind of useful whole-tree versioning, and the fact
that there wasn't much branching and merging, due to it being a pain in
the ass.  If an when we move to something like Git, something else will
have to be done, as that file will /always/ be in conflict.

> In addition to the above stats, it may be interesting to know:
> 
>  - pack generation time and memory footprint (/usr/bin/time);
> 
>    I suspect you would have to try_delta more candidates, so
>    this may degrade a bit, but that is done for getting a better
>    deltification, so we would need to see if the extra cost is
>    within reason and worth spending.

It was already try_delta'ing everything in the window.  The only
difference now is that create_delta may generate one more byte of delta
before giving up.  That doesn't seem to have affected things at all
outside of sampling noise:

(These timings are for the Git pack on Linux/amd64, --window and --depth
both 100.  Since /usr/bin/time doesn't seem to report any useful memory
statistics on Linux, I also have a "ps aux" line from when the memory
size looked stable.  This was different from run to run but it shows the
two are in the same order of magnitude.)

Unpatched:
54.99user 0.18system 0:56.80elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+32417minor)pagefaults 0swaps
bdowning  5290 98.7  4.5 106788 92900 pts/1    R+   01:26   0:49 git pack-obj

Patched:
55.37user 0.19system 0:56.35elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32249minor)pagefaults 0swaps
bdowning  6086  100  4.5 106880 92996 pts/1    R+   01:29   0:49 git pack-obj

>  - resulting pack size (ls -l pack-*.pack)
> 
>    I do not expect your change would degrade in this area, as
>    you are currently not trading size with shallower delta
>    depth.

The patched version is actually smaller in both SBCL's and Git's case
(again, --window 100 and --depth 100):

SBCL: 61696 bytes smaller (13294225-13232529)
Git:  16010 bytes smaller (12690424-12674414)

I believe the reason for this is that more deltas can get in under the
depth limit.  If I repack the Git pack with --depth=999999999, the patched
version generates a pack that is 1793 bytes smaller.  (12334183-12332390)
(Hmm, I was expecting that to be the same, I'm not sure why it's not.
Padding?)

> Regarding your patch, I think it does not look too bad, as you
> never pick delta that is larger than the best-so-far in favor of
> shallower depth.
> 
> It would become worrysome (*BUT* infinitely more interesting)
> once you start talking about a tradeoff between slightly larger
> delta and much shorter delta.  Such a tradeoff, if done right,
> would make a lot of sense, but I do not offhand think of a way
> to strike a proper balance between them efficiently.

Yeah, I was thinking about that too, and came to the same conclusion.
I suspect you'd have to save a /lot/ of delta depth to want to pay any
more I/O, though.

Another thing that might be iffy (and complicated) is that if you keep
making a good low-depth delta off of a particular object, it might be
good to promote it so it stays in the window for longer.

-bcd
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux