Re: PATCH: New diff-delta.c implementation (updated)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <junkio@xxxxxxx> writes:

> In the kernel repository (checked out is near the tip of the
> source tree), the largest files are fs/nls/nls_cp949.c (900kB
> korean character encoding), drivers/usb/misc/emi62_fw_s.h
> (800kB, Emagic firmware blob), arch/m68k/ifpsp060/src/fpsp.S
> (750kB, floating point emulation?), and nowhere near your
> algorithm really should shine.
>
> We would probably want some internal logic that says "if we see
> that blobs larger than X MB is involved in the packing, we
> should use this version of diff-delta, otherwise the other one."

Third impression, synthetic workload.  A sequence of single file
project, the file is tarball of git.git tree (that is,
"git-tar-tree vX.Y.Z >tarball"), 120 objects or so (1 commit per
rev, 1 tree to hold 1 blob).  The (uncompressed) size of the 40
blobs in the pack are between 2.06MB - 2.86MB (average 2.30MB).

(Nico)
Total 123, written 123 (delta 38), reused 0 (delta 0)
67.26user 1.03system 1:08.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+136066minor)pagefaults 0swaps

1822079 pack-nico-26989d516c62197592d0d52db24dfc6a58b633eb.pack


(Geert)
Total 123, written 123 (delta 38), reused 0 (delta 0)
67.23user 1.35system 1:09.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+164124minor)pagefaults 0swaps

1683139 pack-geert-26989d516c62197592d0d52db24dfc6a58b633eb.pack

That's an 8% improvement in the same time, which is quite
impressive.  But I am _very_ unhappy about this particular
synthetic workload.  I wonder if there are projects with many
large blobs that is updated often, so that we can use it as a
yardstick.  Maybe Wine people have icons, background images and
sounds perhaps?  But I suspect you would not update them that
often.

Thinking about it, it does not make much sense, at least to me,
to store large tarballs or binary blobs or whatnot in a SCM (we
are _not_ in the archival business) and keeping track of their
changes.  The tarball is out of question -- it is not a source
(in GPL sense of the word -- it is not a preferred way to make
modification; you modify constituent files and bundle up the
result as a new tarball).  Graphics images, perhaps.


-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]