[PATCH v2 00/10] gc docs: modernize and fix the documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For v1 see: https://public-inbox.org/git/20190318161502.7979-1-avarab@xxxxxxxxx/

This addresses all the feedback it got, which includes splitting out
various "while we're at it" fixes, and then I found/remembered some
more things I needed to fix.

It would still be great to have Peff submit some version of his
https://public-inbox.org/git/20190319001829.GL29661@xxxxxxxxxxxxxxxxxxxxx/
reply to the NOTES section sometime, but I had to stop somewhere.

I also documented the fast-import caveat discussed in
https://public-inbox.org/git/87o964cnn0.fsf@xxxxxxxxxxxxxxxxxxx/ while
I was at it, as promised.

Ævar Arnfjörð Bjarmason (10):
  gc docs: modernize the advice for manually running "gc"
  gc docs: stop noting "repack" flags
  gc docs: clean grammar for "gc.bigPackThreshold"
  gc docs: include the "gc.*" section from "config" in "gc"
  gc docs: re-flow the "gc.*" section in "config"
  gc docs: note how --aggressive impacts --window & --depth
  gc docs: downplay the usefulness of --aggressive
  gc docs: note "gc --aggressive" in "fast-import"
  gc docs: clarify that "gc" doesn't throw away referenced objects
  gc docs: remove incorrect reference to gc.auto=0

 Documentation/config/gc.txt       |  34 ++++++-
 Documentation/git-fast-import.txt |   7 ++
 Documentation/git-gc.txt          | 142 ++++++++++--------------------
 3 files changed, 84 insertions(+), 99 deletions(-)

Range-diff:
 1:  d48b9c7221 !  1:  89719142c7 gc docs: modernize the advice for manually running "gc"
    @@ -3,10 +3,17 @@
         gc docs: modernize the advice for manually running "gc"
     
         The docs have been recommending that users need to run this manually,
    -    but that hasn't been needed in practice for a long time.
    +    but that hasn't been needed in practice for a long time except in
    +    exceptional circumstances.
     
         Let's instead have this reflect reality and say that most users don't
    -    need to run this manually at all.
    +    need to run this manually at all, while briefly describing the sorts
    +    sort of cases where "gc" does need to be run manually.
    +
    +    Since we're recommending that users run this most of the and usually
    +    don't need to tweak it, let's tone down the very prominent example of
    +    the gc.auto=0 command. It's sufficient to point to the gc.auto
    +    documentation below.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
    @@ -20,20 +27,24 @@
     -Users are encouraged to run this task on a regular basis within
     -each repository to maintain good disk space utilization and good
     -operating performance.
    -+Most users should not have to run this command manually. When common
    -+porcelain operations that create objects are run, such as
    -+linkgit:git-commit[1] and linkgit:git-fetch[1], `git gc --auto` will
    -+be run automatically.
    - 
    +-
     -Some git commands may automatically run 'git gc'; see the `--auto` flag
     -below for details. If you know what you're doing and all you want is to
     -disable this behavior permanently without further considerations, just do:
    -+You should only need to run `git gc` manually when adding objects to a
    -+repository without regularly running such porcelain commands. Another
    -+use-case is wanting to do a one-off repository optimization.
    +-
    +-----------------------
    +-$ git config --global gc.auto 0
    +-----------------------
    ++When common porcelain operations that creates objects are run, they
    ++will check whether the repository has grown substantially since the
    ++last maintenance, and if so run `git gc` automatically. See `gc.auto`
    ++below for how to disable this behavior.
     +
    -+If you know what you're doing and all you want is to disable automatic
    -+runs, do:
    ++Running `git gc` manually should only be needed when adding objects to
    ++a repository without regularly running such porcelain commands, to do
    ++a one-off repository optimization, or e.g. to clean up a suboptimal
    ++mass-import. See the "PACKFILE OPTIMIZATION" section in
    ++linkgit:git-fast-import[1] for more details on the import case.
      
    - ----------------------
    - $ git config --global gc.auto 0
    + OPTIONS
    + -------
 -:  ---------- >  2:  d90a5b1b4c gc docs: stop noting "repack" flags
 -:  ---------- >  3:  fedd9bb886 gc docs: clean grammar for "gc.bigPackThreshold"
 2:  e670d514ce !  4:  6fad05a67c gc docs: include the "gc.*" section from "config" in "gc"
    @@ -34,16 +34,52 @@
      
      gc.auto::
      	When there are approximately more than this many loose
    + 	objects in the repository, `git gc --auto` will pack them.
    + 	Some Porcelain commands use this command to perform a
    + 	light-weight garbage collection from time to time.  The
    +-	default value is 6700.  Setting this to 0 disables it.
    ++	default value is 6700.
    +++
    ++Setting this to 0 disables not only automatic packing based on the
    ++number of loose objects, but any other heuristic `git gc --auto` will
    ++otherwise use to determine if there's work to do, such as
    ++`gc.autoPackLimit`.
    + 
    + gc.autoPackLimit::
    + 	When there are more than this many packs that are not
    + 	marked with `*.keep` file in the repository, `git gc
    + 	--auto` consolidates them into one larger pack.  The
    + 	default	value is 50.  Setting this to 0 disables it.
    ++	Setting `gc.auto` to 0 will also disable this.
    +++
    ++See the `gc.bigPackThreshold` configuration variable below. When in
    ++use, it'll affect how the auto pack limit works.
    + 
    + gc.autoDetach::
    + 	Make `git gc --auto` return immediately and run in background
    +@@
    + this configuration variable is ignored, all packs except the base pack
    + will be repacked. After this the number of packs should go below
    + gc.autoPackLimit and gc.bigPackThreshold should be respected again.
    +++
    ++If the amount of memory estimated for `git repack` to run smoothly is
    ++not available and `gc.bigPackThreshold` is not set, the largest
    ++pack will also be excluded (this is the equivalent of running `git gc`
    ++with `--keep-base-pack`).
    + 
    + gc.writeCommitGraph::
    + 	If true, then gc will rewrite the commit-graph file when
     @@
      	With "<pattern>" (e.g. "refs/stash")
      	in the middle, the setting applies only to the refs that
      	match the <pattern>.
     ++
    -+These types of entries are generally created as a result of using `git
    -+commit --amend` or `git rebase` and are the commits prior to the amend
    -+or rebase occurring. Since these changes are not part of the current
    -+project history most users will want to expire them sooner, which is
    -+why the default is more aggressive than `gc.reflogExpire`.
    ++These types of entries are generally created as
    ++a result of using `git commit --amend` or `git rebase` and are the
    ++commits prior to the amend or rebase occurring.  Since these changes
    ++are not part of the current project most users will want to expire
    ++them sooner, which is why the default is more aggressive than
    ++`gc.reflogExpire`.
      
      gc.rerereResolved::
      	Records of conflicted merge you resolved earlier are
    @@ -52,15 +88,38 @@
      --- a/Documentation/git-gc.txt
      +++ b/Documentation/git-gc.txt
     @@
    - repository without regularly running such porcelain commands. Another
    - use-case is wanting to do a one-off repository optimization.
    - 
    --If you know what you're doing and all you want is to disable automatic
    --runs, do:
    -+If you know what you're doing and want to disable automatic runs, do:
    + --auto::
    + 	With this option, 'git gc' checks whether any housekeeping is
    + 	required; if not, it exits without performing any work.
    +-	Some git commands run `git gc --auto` after performing
    +-	operations that could create many loose objects. Housekeeping
    +-	is required if there are too many loose objects or too many
    +-	packs in the repository.
    + +
    +-If the number of loose objects exceeds the value of the `gc.auto`
    +-configuration variable, then all loose objects are combined into a
    +-single pack.  Setting the value of `gc.auto`
    +-to 0 disables automatic packing of loose objects.
    ++See the `gc.auto' option in the "CONFIGURATION" section below for how
    ++this heuristic works.
    + +
    +-If the number of packs exceeds the value of `gc.autoPackLimit`,
    +-then existing packs (except those marked with a `.keep` file
    +-or over `gc.bigPackThreshold` limit)
    +-are consolidated into a single pack.
    +-If the amount of memory estimated for `git repack` to run smoothly is
    +-not available and `gc.bigPackThreshold` is not set, the largest
    +-pack will also be excluded (this is the equivalent of running `git gc`
    +-with `--keep-base-pack`).
    +-Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
    +-packs.
    +-+
    +-If houskeeping is required due to many loose objects or packs, all
    ++Once housekeeping is triggered by exceeding the limits of
    ++configuration options such as `gc.auto` and `gc.autoPackLimit`, all
    + other housekeeping tasks (e.g. rerere, working trees, reflog...) will
    + be performed as well.
      
    - ----------------------
    - $ git config --global gc.auto 0
     @@
      CONFIGURATION
      -------------
 3:  d6f1e001a4 !  5:  994e22a0d6 gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
    @@ -1,18 +1,10 @@
     Author: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
    -    gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
    +    gc docs: re-flow the "gc.*" section in "config"
     
    -    In an earlier commit I started including the "gc.*" documentation from
    -    git-config(1) in the git-gc(1) documentation. That still left us in a
    -    state where the "--auto" option and "gc.auto" were redundantly
    -    discussing the same thing.
    -
    -    Fix that by briefly discussing how the option itself works for
    -    "--auto", and for the rest referring to the configuration
    -    documentation.
    -
    -    This revealed existing blind spots in the configuration documentation,
    -    move over the documentation and reword as appropriate.
    +    Re-flow the "gc.*" section in "config". A previous commit moved this
    +    over from the "gc" docs, but tried to keep as many of the lines
    +    identical to benefit from diff's move detection.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
    @@ -20,91 +12,33 @@
      --- a/Documentation/config/gc.txt
      +++ b/Documentation/config/gc.txt
     @@
    - 	objects in the repository, `git gc --auto` will pack them.
    - 	Some Porcelain commands use this command to perform a
    - 	light-weight garbage collection from time to time.  The
    --	default value is 6700.  Setting this to 0 disables it.
    -+	default value is 6700.
    -++
    -+Setting this to 0 disables not only automatic packing based on the
    -+number of loose objects, but any other heuristic `git gc --auto` will
    -+otherwise use to determine if there's work to do, such as
    -+`gc.autoPackLimit`.
    -++
    -+The repacking of loose objects will be performed with `git repack -d
    -+-l`.
    - 
    - gc.autoPackLimit::
    -+
    - 	When there are more than this many packs that are not
    - 	marked with `*.keep` file in the repository, `git gc
    - 	--auto` consolidates them into one larger pack.  The
    --	default	value is 50.  Setting this to 0 disables it.
    -+	default value is 50.  Setting this (or `gc.auto`) to 0
    -+	disables it. Packs will be consolidated using the `-A` option
    -+	of `git repack`.
    -++
    -+See the `gc.bigPackThreshold` configuration variable below. When in
    -+use it'll effect how the auto pack limit works.
    - 
    - gc.autoDetach::
    - 	Make `git gc --auto` return immediately and run in background
    -@@
    - 	If non-zero, all packs larger than this limit are kept when
    - 	`git gc` is run. This is very similar to `--keep-base-pack`
    - 	except that all packs that meet the threshold are kept, not
    --	just the base pack. Defaults to zero. Common unit suffixes of
    --	'k', 'm', or 'g' are supported.
    -+	just the base pack. Defaults to zero or a memory heuristic.
    -+	Common unit suffixes of 'k', 'm', or 'g' are supported.
    - +
    - Note that if the number of kept packs is more than gc.autoPackLimit,
    - this configuration variable is ignored, all packs except the base pack
    - will be repacked. After this the number of packs should go below
      gc.autoPackLimit and gc.bigPackThreshold should be respected again.
    -++
    -+If the amount of memory is estimated not enough for `git repack` to
    -+run smoothly and `gc.bigPackThreshold` is not set, the largest pack
    -+will also be excluded (which is the equivalent of running `git gc`
    -+with `--keep-base-pack`).
    + +
    + If the amount of memory estimated for `git repack` to run smoothly is
    +-not available and `gc.bigPackThreshold` is not set, the largest
    +-pack will also be excluded (this is the equivalent of running `git gc`
    +-with `--keep-base-pack`).
    ++not available and `gc.bigPackThreshold` is not set, the largest pack
    ++will also be excluded (this is the equivalent of running `git gc` with
    ++`--keep-base-pack`).
      
      gc.writeCommitGraph::
      	If true, then gc will rewrite the commit-graph file when
    -
    - diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
    - --- a/Documentation/git-gc.txt
    - +++ b/Documentation/git-gc.txt
     @@
    - --auto::
    - 	With this option, 'git gc' checks whether any housekeeping is
    - 	required; if not, it exits without performing any work.
    --	Some git commands run `git gc --auto` after performing
    --	operations that could create many loose objects. Housekeeping
    --	is required if there are too many loose objects or too many
    --	packs in the repository.
    + 	in the middle, the setting applies only to the refs that
    + 	match the <pattern>.
      +
    --If the number of loose objects exceeds the value of the `gc.auto`
    --configuration variable, then all loose objects are combined into a
    --single pack using `git repack -d -l`.  Setting the value of `gc.auto`
    --to 0 disables automatic packing of loose objects.
    -+See the `gc.auto' option in the "CONFIGURATION" below for how this
    -+heuristic works.
    - +
    --If the number of packs exceeds the value of `gc.autoPackLimit`,
    --then existing packs (except those marked with a `.keep` file
    --or over `gc.bigPackThreshold` limit)
    --are consolidated into a single pack by using the `-A` option of
    --'git repack'.
    --If the amount of memory is estimated not enough for `git repack` to
    --run smoothly and `gc.bigPackThreshold` is not set, the largest
    --pack will also be excluded (this is the equivalent of running `git gc`
    --with `--keep-base-pack`).
    --Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
    --packs.
    --+
    --If houskeeping is required due to many loose objects or packs, all
    -+Once housekeeping is triggered by exceeding the limits of
    -+configurations options such as `gc.auto` and `gc.autoPackLimit`, all
    - other housekeeping tasks (e.g. rerere, working trees, reflog...) will
    - be performed as well.
    +-These types of entries are generally created as
    +-a result of using `git commit --amend` or `git rebase` and are the
    +-commits prior to the amend or rebase occurring.  Since these changes
    +-are not part of the current project most users will want to expire
    +-them sooner, which is why the default is more aggressive than
    +-`gc.reflogExpire`.
    ++These types of entries are generally created as a result of using `git
    ++commit --amend` or `git rebase` and are the commits prior to the amend
    ++or rebase occurring.  Since these changes are not part of the current
    ++project most users will want to expire them sooner, which is why the
    ++default is more aggressive than `gc.reflogExpire`.
      
    + gc.rerereResolved::
    + 	Records of conflicted merge you resolved earlier are
 -:  ---------- >  6:  916433ef73 gc docs: note how --aggressive impacts --window & --depth
 4:  257aff2808 !  7:  457357b464 gc docs: downplay the usefulness of --aggressive
    @@ -3,19 +3,17 @@
         gc docs: downplay the usefulness of --aggressive
     
         The existing "gc --aggressive" docs come just short of recommending to
    -    users that they run it regularly. In reality it's a waste of CPU for
    -    most users, and may even make things actively worse. I've personally
    -    talked to many users who've taken these docs as an advice to use this
    -    option, and have.
    +    users that they run it regularly. I've personally talked to many users
    +    who've taken these docs as an advice to use this option, and have,
    +    usually it's (mostly) a waste of time.
     
    -    Let's change this documentation to better reflect reality, i.e. for
    -    most users using --aggressive is a waste of time, and may even be
    -    actively making things worse.
    +    So let's clarify what it really does, and let the user draw their own
    +    conclusions.
     
    -    Let's also clarify the "The effects [...] are persistent" to clearly
    -    note that that's true to the extent that subsequent gc's aren't going
    -    to re-roll existing packs generated with --aggressive into a new set
    -    of packs.
    +    Let's also clarify the "The effects [...] are persistent" to
    +    paraphrase a brief version of Jeff King's explanation at [1].
    +
    +    1. https://public-inbox.org/git/20190318235356.GK29661@xxxxxxxxxxxxxxxxxxxxx/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
    @@ -23,27 +21,45 @@
      --- a/Documentation/git-gc.txt
      +++ b/Documentation/git-gc.txt
     @@
    - --aggressive::
    - 	Usually 'git gc' runs very quickly while providing good disk
      	space utilization and performance.  This option will cause
    --	'git gc' to more aggressively optimize the repository at the expense
    --	of taking much more time.  The effects of this optimization are
    + 	'git gc' to more aggressively optimize the repository at the expense
    + 	of taking much more time.  The effects of this optimization are
     -	persistent, so this option only needs to be used occasionally; every
     -	few hundred changesets or so.
    -+	'git gc' to more aggressively optimize the repository to save storage space
    -+	at the expense of taking much more time.
    -++
    -+Using this option may optimize for disk space at the expense of
    -+runtime performance. See the `--depth` and `--window` documentation in
    -+linkgit:git-repack[1]. It is not recommended that this option be used
    -+to improve performance for a given repository without running tailored
    -+performance benchmarks on it. It may make things better, or worse. Not
    -+using this at all is the right trade-off for most users and their
    -+repositories.
    -++
    -+The effects of this option are persistent to the extent that
    -+`gc.autoPackLimit` and friends don't cause a consolidation of existing
    -+pack(s) generated with this option.
    ++	mostly persistent. See the "AGGRESSIVE" section below for details.
      
      --auto::
      	With this option, 'git gc' checks whether any housekeeping is
    +@@
    + 	`.keep` files are consolidated into a single pack. When this
    + 	option is used, `gc.bigPackThreshold` is ignored.
    + 
    ++AGGRESSIVE
    ++----------
    ++
    ++When the `--aggressive` option is supplied, linkgit:git-repack[1] will
    ++be invoked with the `-f` flag, which in turn will pass
    ++`--no-reuse-delta` to linkgit:git-pack-objects[1]. This will throw
    ++away any existing deltas and re-compute them, at the expense of
    ++spending much more time on the repacking.
    ++
    ++The effects of this are mostly persistent, e.g. when packs and loose
    ++objects are coalesced into one another pack the existing deltas in
    ++that pack might get re-used, but there are also various cases where we
    ++might pick a sub-optimal delta from a newer pack instead.
    ++
    ++Furthermore, supplying `--aggressive` will tweak the `--depth` and
    ++`--window` options passed to linkgit:git-repack[1]. See the
    ++`gc.aggressiveDepth` and `gc.aggressiveWindow` settings below. By
    ++using a larger window size we're more likely to find more optimal
    ++deltas.
    ++
    ++It's probably not worth it to use this option on a given repository
    ++without running tailored performance benchmarks on it. It takes a lot
    ++more time, and the resulting space/delta optimization may or may not
    ++be worth it. Not using this at all is the right trade-off for most
    ++users and their repositories.
    ++
    + CONFIGURATION
    + -------------
    + 
 -:  ---------- >  8:  d80a6021f5 gc docs: note "gc --aggressive" in "fast-import"
 -:  ---------- >  9:  a5d31faf6f gc docs: clarify that "gc" doesn't throw away referenced objects
 -:  ---------- > 10:  9fd1203ad5 gc docs: remove incorrect reference to gc.auto=0
-- 
2.21.0.360.g471c308f928




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux