On Wed, Oct 6, 2010 at 11:32 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> +Performance concerns >> +-------------------- >> + >> +Git is written with performance in mind and it works extremely well >> +with its typical repositories (i.e. source code repositories, with >> +a moderate number of small text files, possibly with long history). >> +Non-typical repositories (a lot of files, or very large files...) >> +may experience mild performance degradation. This section describes >> +how Git behaves in such repositories and how to reduce impact. >> + > > I have seen this "mild" suggested in the discussion, but do we want any > adjective here? ÂThe runtime for, say, "git log" from the tip to the root > obviously would grow proportionally to the length of the history, i.e. the > number of records you would want to see, and it may not be "mild" if your > history is very deep. ÂSame for the runtime for "git diff" in a wide > project with many changed paths. I don't want to give an impression that the sky will fall when someone puts a 200MB file in his repo. > More importantly, what is "degradation"? ÂIt is not a degradation if "git > log" took 100x as long for a project with 100k commits compared to a > similar project with 1k commits. >From my perspective, git commands that are instant in typical repos should still be instant in non-typical ones. Yes "git add hugefile" will take longer than "git add git.c", but it should not take, say, 1 hour for that command. It's hard to draw a clear line here. > If you do not have enough core to hold the part of the ancestry graph that > is involved to compute "git log A..B" to show a gazillion commits, it will > eat into the swap, take a lot more time than it takes "git log B" to show > the same number of commits. ÂThat _is_ degradation, and I suspect it won't > be mild at all. > >> +For repositories with a large number of files (~50k files or more), > > How did you come up with this 50k number? Quite unscientific, I started with gentoo-x86 (~130k files) which I know git performs less than satisfactory. I also looked how big other repos are, wine.git, linux-2.6.git... then choose a number in the middle. >> +but you only need a few of them present in working tree, you can use >> +sparse checkout (see linkgit:git-read-tree[1], section 'Sparse >> +checkout'). > > Is "sparse checkout" a real feature that has been made usable by mere > mortals, battle tested, and shown to be reliable? Hopefully. In 2010 survey, there are 331 answers they use "partial (sparse) checkout". I hope that they used this feature, not something else. > It feels funny that we have to refer to the documentation of plumbing > read-tree when the key verb in this paragraph is "checkout". ÂWith the > current documentation set, you can follow read-tree page that mentions > some magic called skip-worktree-bit, get tempted to jump to update-index > page and get lost in the implementation details of the feature, which is > irrelevant to the end user. ÂIf you resisted the temptation and keep > reading read-tree page, you see the description of info/sparse-checkout to > learn how to control the feature, but it does not come with an > easy-to-follow example. ÂA few concrete suggestions to "Sparse checkout" > section in read-tree: > > ... > Hmm.. yeah. Will do something. > I think the suggestion to use Sparse checkout in git(1)---i.e. your patch > we are discussing here, is a bit premature before the above happens. > >> +... If you need all of them present in working tree, but you >> +know in advance only a few of them may be modified, please consider >> +using assume-unchanged bit (see linkgit:git-update-index[1]). >> +... The following commands are >> +however known to do full index refresh in some cases: > > It is "need to", not "are known to", isn't it? In case of "git commit", as you said in another mail, index refresh is needed because of post-commit hook. If there are no hooks, I think index refresh can be skipped. But yes, probably "need to". >> +Some commands need entire file content in memory to process. >> +Files that have size a significant portion of physical RAM may >> +affect performance. You may want to avoid using the following >> +commands if possible on such large files: > > "If possible" is not a good excuse. ÂHow would one _avoid_ checkout of a > file if one wants to use it? ÂYou can't. ÂSimilarly to "diff". ÂThis > advice is pretty much useless, isn't it? ÂIt's not much better than saying > "if your machine has too little RAM, things will get slow---deal with it". That's more of bug acknowledgement, or to-be-improved TODOs. I didn't want to say that out loud. Should I? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html