Re: PATCH: New diff-delta.c implementation (updated)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Geert Bosch <bosch@xxxxxxxx> writes:

> Even though the previous version did really well on large files
> with many changes, performance was lacking for the many small
> files with very few changes that are so common for a VCS.
>...
> The result has been only a slight increase in delta size for
> very large test cases (but with better performance), and
> both smaller deltas and faster execution speed for repacking
> git.git. I had trouble cloning the Linux kernel repository,
> but am now reasonably confident this will outperform the
> existing algorithm pretty consistently.

Interesting.

Initial impression, the same test as before (a full packing of
the git.git repository that does not have _any_ pack -- all 18k
objects are loose).

First, the incumbent, with the "reusing delta-index" patch applied.

Total 17724, written 17724 (delta 12002), reused 0 (delta 0)
34.02user 6.48system 0:42.87elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+434478minor)pagefaults 0swaps

 6188418 pack-nico-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

Then diff-delta.c replaced with your version.

Total 17724, written 17724 (delta 12012), reused 0 (delta 0)
44.87user 6.54system 0:54.01elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+441124minor)pagefaults 0swaps

 6099183 pack-geert-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

Second impression, in a recent kernel tree which is mostly
packed.  Packing 41k objects (v2.6.16..v2.6.17-rc3), with
"git-pack-objects --no-reuse-delta".

(Nico)
Total 41591, written 41591 (delta 29285), reused 8563 (delta 0)
169.08user 12.60system 3:27.68elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+1099928minor)pagefaults 0swaps

37363966 pack-nico-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack

(Geert)
Total 41591, written 41591 (delta 29347), reused 8427 (delta 0)
243.71user 12.32system 4:28.11elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1077843minor)pagefaults 0swaps

37165890 pack-geert-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack


Of course, the absolute numbers do not matter, but for the
record these are on my Duron 750, 760MB or so RAM and with
relatively slow disks.

In the kernel repository (checked out is near the tip of the
source tree), the largest files are fs/nls/nls_cp949.c (900kB
korean character encoding), drivers/usb/misc/emi62_fw_s.h
(800kB, Emagic firmware blob), arch/m68k/ifpsp060/src/fpsp.S
(750kB, floating point emulation?), and nowhere near your
algorithm really should shine.

We would probably want some internal logic that says "if we see
that blobs larger than X MB is involved in the packing, we
should use this version of diff-delta, otherwise the other one."

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]