Some more pack-objects tweaks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've been working more pack-objects improvements.  There will be
two tweaks in the "next" branch I'll be pushing out tonight.

 * rev-list reports full pathnames not just basenames for
   contained trees and blobs.  pack-objects hashes the incoming
   names (and names obtained from "negative" trees when
   --objects-edge aka "thin pack" is used) taking into account
   the dirname and basename part.

   Earlier, I had a patch that hashes the whole pathname, and
   found it perform worse than the original "hash just the
   basename" approach, so I never published it.  The idea in
   this round is to give "Makefile" and "t/Makefile" a different
   but close hash values.  Type-size sort groups "Makefile"s
   from different revs together, and another group of bunch of
   "t/Makefile"s are found close by.

 * when creating "thin" pack, disable the code to avoid too
   long a delta chain to be made due to reused delta (see
   15b4d57 and ab7cd7b commit log for details).

   This is because limiting delta chain is more costly than let
   it grow by using preexisting delta, and "thin" pack is usable
   by first exploding it, so at that point delta depth does not
   matter.

In Linux 2.6 repository, I've created a thin pack between
v2.6.14..v2.6.15-rc1 (36k objects).  Here are the results:

    [without either patch]
    15463034 bytes
    Total 36248, written 36248 (delta 29046), reused 28306 (delta 22512)
    real    1m38.157s       user    1m32.520s       sys     0m5.440s

    [with full names]
    11138621 bytes
    Total 36248, written 36248 (delta 30368), reused 27918 (delta 22512)
    real    1m36.254s       user    1m28.650s       sys     0m5.470s

    [with full names, and allowing deeper delta]
    9971223 bytes
    Total 36248, written 36248 (delta 30868), reused 27429 (delta 22512)
    real    1m36.923s       user    1m29.770s       sys     0m5.470s

All of these tests were done with the last patch in Nico's delta
enhancement series reverted, because the dataset used in this
test triggers a corner case performance disaster in it (I've
sent a message separately).

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]