pclouds@xxxxxxxxx writes: > From: Nguyán ThÃi Ngác Duy <pclouds@xxxxxxxxx> > > --- > Revised version. I dropped shallow clone because it does not really > relate to performance. > > Documentation/git.txt | 41 +++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 41 insertions(+), 0 deletions(-) > > diff --git a/Documentation/git.txt b/Documentation/git.txt > index dd57bdc..129947f 100644 > --- a/Documentation/git.txt > +++ b/Documentation/git.txt > @@ -729,6 +729,47 @@ The index is also capable of storing multiple entries (called "stages") > for a given pathname. These stages are used to hold the various > unmerged version of a file when a merge is in progress. > > +Performance concerns > +-------------------- > + > +Git is written with performance in mind and it works extremely well > +with its typical repositories (i.e. source code repositories, with > +a moderate number of small text files, possibly with long history). > +Non-typical repositories (a lot of files, or very large files...) > +may experience mild performance degradation. This section describes > +how Git behaves in such repositories and how to reduce impact. > + I have seen this "mild" suggested in the discussion, but do we want any adjective here? The runtime for, say, "git log" from the tip to the root obviously would grow proportionally to the length of the history, i.e. the number of records you would want to see, and it may not be "mild" if your history is very deep. Same for the runtime for "git diff" in a wide project with many changed paths. More importantly, what is "degradation"? It is not a degradation if "git log" took 100x as long for a project with 100k commits compared to a similar project with 1k commits. If you do not have enough core to hold the part of the ancestry graph that is involved to compute "git log A..B" to show a gazillion commits, it will eat into the swap, take a lot more time than it takes "git log B" to show the same number of commits. That _is_ degradation, and I suspect it won't be mild at all. > +For repositories with a large number of files (~50k files or more), How did you come up with this 50k number? > +but you only need a few of them present in working tree, you can use > +sparse checkout (see linkgit:git-read-tree[1], section 'Sparse > +checkout'). Is "sparse checkout" a real feature that has been made usable by mere mortals, battle tested, and shown to be reliable? It feels funny that we have to refer to the documentation of plumbing read-tree when the key verb in this paragraph is "checkout". With the current documentation set, you can follow read-tree page that mentions some magic called skip-worktree-bit, get tempted to jump to update-index page and get lost in the implementation details of the feature, which is irrelevant to the end user. If you resisted the temptation and keep reading read-tree page, you see the description of info/sparse-checkout to learn how to control the feature, but it does not come with an easy-to-follow example. A few concrete suggestions to "Sparse checkout" section in read-tree: - Move the section to a separate file and include it in read-tree page, so that we can include it later in checkout page; - Drop the first paragraph; - Move the second and third paragraph, that still describe the machinery more than the usage, much later in the section; - Start the section with the description of info/sparse-checkout; the first sentence ("while ... is usually used") need to be rewritten, because (1) it is not a complete sentence and grammatically incorrect, and (2) it reads as if you will say an alternative file can be used instead of info/sparse-checkout, which is not what you wanted to do; perhaps "$GIT_DIR/info/sparse-checkout is used to specify which paths are to be (and not to be) checked out. It lists glob patterns to match paths to be checked out. Prefix the pattern with a '!' to specify a pattern to match paths not to be checked out. Note: a bug in the implementation requires you to end a pattern with a trailing slash to match a directory". - Show examples; not just the samples of how contents of that control file looks like, but also with a concrete command sequence (e.g. (1) run "git clone -n", (2) edit info/sparse-checkout to contain this, (3) run "git checkout", (4) here is how to widen/narrow the sparse checkout--first edit info/sparse-checkout to look like this and then run "git xxx" to match the updated definition, etc.). - Drop BUGS section from read-tree documentation (but see Note: above in my example); the bug mentioned there is not a bug of read-tree, but is a bug in the sparse-checkout feature. I think the suggestion to use Sparse checkout in git(1)---i.e. your patch we are discussing here, is a bit premature before the above happens. > +... If you need all of them present in working tree, but you > +know in advance only a few of them may be modified, please consider > +using assume-unchanged bit (see linkgit:git-update-index[1]). > +... The following commands are > +however known to do full index refresh in some cases: It is "need to", not "are known to", isn't it? > +Some commands need entire file content in memory to process. > +Files that have size a significant portion of physical RAM may > +affect performance. You may want to avoid using the following > +commands if possible on such large files: "If possible" is not a good excuse. How would one _avoid_ checkout of a file if one wants to use it? You can't. Similarly to "diff". This advice is pretty much useless, isn't it? It's not much better than saying "if your machine has too little RAM, things will get slow---deal with it". > +* All checkout commands (linkgit:git-checkout[1], > + linkgit:git-checkout-index[1], linkgit:git-read-tree[1], > + linkgit:git-clone[1]...) > +* All diff-related commands (linkgit:git-diff[1], > + linkgit:git-log[1] with diff, linkgit:git-show[1] on commits...) > +* All commands that need file conversion processing (see > + linkgit:gitattributes[5]) > + > Authors > ------- > * git's founding father is Linus Torvalds <torvalds@xxxxxxxx>. > -- > 1.7.0.2.445.gcbdb3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html