[RFC PATCH] git.txt: document limitations on non-typical repos (and hints)

Nguyán ThÃi Ngác Duy <pclouds@xxxxxxxxx> · Wed, 6 Oct 2010 21:21:38 +0700

Signed-off-by: Nguyán ThÃi Ngác Duy <pclouds@xxxxxxxxx>
---
 I wanted to make a more detailed description, per command. It would
 serve as guidance for people on special repos, also as TODOs for Git
 developers. But that seems a lot of work on analyzing each commands.

 Instead I made this text to warn users where performance may decrease,
 and to hint them features that might help. Do I miss anything?

 There were discussions in the past on maintaining large files out-of-repo,
 and symlinks to them in-repo. That sounds like a good advice, doesn't it?

 Documentation/git.txt |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index dd57bdc..8408923 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -729,6 +729,52 @@ The index is also capable of storing multiple entries (called "stages")
 for a given pathname.  These stages are used to hold the various
 unmerged version of a file when a merge is in progress.
 
+Performance concerns
+--------------------
+
+Git is written with performance in mind and it works extremely well
+with its typical repositories (i.e. source code repositories, with
+a moderate number of small text files, possibly with long history).
+Non-typical repositories (huge number of files, or very large
+files...) may experience performance degradation. This section describes
+how Git behaves in such repositories and how to reduce impact.
+
+For repositories with really long history, you may want to work on
+a shallow clone of it (see linkgit:git-clone[1], option '--depth').
+A shallow repository does not contain full history, so it may consume
+less disk space and network bandwidth. On the other hand, you cannot
+fetch from it. And obviously you cannot look further back than what
+it has in history (you can deepen history though).
+
+For repositories with a large number of files, but you only need
+a few of them present in working tree, you can use sparse checkout
+(see linkgit:git-read-tree[1], section 'Sparse checkout'). Sparse
+checkout can be used with either a normal repository, or a shallow
+one.
+
+Git uses lstat(3) to detect changes in working tree. A huge number
+of lstat(3) calls may impact performance, especially on systems with
+slow lstat(3). In some cases you can reduce the number of lstat(3)
+calls by specifying what directories you are interested in, so no
+lstat(3) outside is needed.
+
+For repositories with a large number of files, you need all of them
+present in working tree, but you know in advance only a few of them
+may be modified, please consider using assume-unchanged bit (see
+linkgit:git-update-index[1]). This helps reduce the number of lstat(3)
+calls.
+
+Some Git commands need entire file content in memory to process.
+You may want to avoid using them if possible on large files. Those
+commands include:
+
+* All checkout commands (linkgit:git-checkout[1],
+  linkgit:git-checkout-index[1], linkgit:git-read-tree[1],
+  linkgit:git-clone[1]...)
+* All diff-related commands (linkgit:git-diff[1],
+  linkgit:git-log[1] with diff, linkgit:git-show[1] on commits...)
+* All commands that need file conversion processing
+
 Authors
 -------
 * git's founding father is Linus Torvalds <torvalds@xxxxxxxx>.
-- 
1.7.0.2.445.gcbdb3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html