Preferring shallower deltas on repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



SBCL, a Lisp implementation, has a file in their (CVS) repository called
"version.lisp-expr" which is updated every commit.  This file looks like:

------------------------------------------------------------------------
;;; This is the master value for LISP-IMPLEMENTATION-VERSION. It's
;;; separated into its own file here so that it's easy for
;;; text-munging make-ish or cvs-ish scripts to find and tweak it. For
;;; the convenience of such scripts, only a simple subset of Lisp
;;; reader syntax should be used here: semicolon-delimited comments,
;;; possible blank lines or other whitespace, and a single
;;; double-quoted string value alone on its own line.
;;;
;;; ANSI says LISP-IMPLEMENTATION-VERSION can be NIL "if no
;;; appropriate and relevant result can be produced", but as long as
;;; we control the build, we can always assign an appropriate and
;;; relevant result, so this must be a string, not NIL.
;;;
;;; Conventionally a string like "0.6.6", with three numeric fields,
;;; is used for released versions, and a string like "0.6.5.xyzzy",
;;; with something arbitrary in the fourth field, is used for CVS
;;; checkins which aren't released. (And occasionally for internal
;;; versions, especially for internal versions off the main CVS
;;; branch, it gets hairier, e.g. "0.pre7.14.flaky4.13".)
"1.0.7.10"
------------------------------------------------------------------------

Only very rarely does anything but the last line change.

The current repack implementation, when given values like "--window=100
--depth=1000", happily generates an incredibly deep tree for this file,
maxing out the depth.  (I'm only using such a high depth to demonstrate
the pessimal behavior.)  I noticed that it just takes the first delta
it finds that is the smallest; it then only looks for deltas with a
max_size of the old size - 1, so it can never find better matches with
regard to depth.

I modified this to prefer shallower deltas of the same size.  This made
the deltas for this file a very wide tree with a maximum depth of about
65.  Other (much smaller) improvements were seen elsewhere in the pack.
Runtime does not seem to have been affected, as most of the work had
already been done when it was tossing deltas before.

Some simple statistics:

SBCL, standard pack-objects, window 100, depth 1000:
  Max depth: 980
  Mean depth: 100.223622114502
  Median depth: 12
  Standard deviation: 188.214331919176

SBCL, patched pack-objects, window 100, depth 1000:
  Max depth: 787
  Mean depth: 61.5669990817656
  Median depth: 11
  Standard deviation: 127.644652607399

git, standard pack-objects, window 100, depth 1000:
  Max depth: 925
  Mean depth: 77.184264479754
  Median depth: 8
  Standard deviation: 150.112998198182

git, patched pack-objects, window 100, depth 1000:
  Max depth: 913
  Mean depth: 74.9981877425496
  Median depth: 7
  Standard deviation: 147.900721785959

The only negative effect I could see from this patch might be pack
locality.  Unfortunately I don't know enough (read: anything) about pack
access patterns to determine if this is the case.

Patch to follow.

-bcd
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux