2010/10/5 Chris Packham <judge.packham@xxxxxxxxx>: > On 05/10/10 06:00, Nguyán ThÃi Ngác Duy wrote: >> >> Signed-off-by: Nguyán ThÃi Ngác Duy <pclouds@xxxxxxxxx> >> --- >> ÂI wanted to make a more detailed description, per command. It would >> Âserve as guidance for people on special repos, also as TODOs for Git >> Âdevelopers. But that seems a lot of work on analyzing each commands. >> >> ÂInstead I made this text to warn users where performance may decrease, >> Âand to hint them features that might help. Do I miss anything? >> >> ÂThere were discussions in the past on maintaining large files out-of-repo, >> Âand symlinks to them in-repo. That sounds like a good advice, doesn't it? >> >> ÂDocumentation/git.txt | Â 46 ++++++++++++++++++++++++++++++++++++++++++++++ >> Â1 files changed, 46 insertions(+), 0 deletions(-) >> >> diff --git a/Documentation/git.txt b/Documentation/git.txt >> index dd57bdc..8408923 100644 >> --- a/Documentation/git.txt >> +++ b/Documentation/git.txt >> @@ -729,6 +729,52 @@ The index is also capable of storing multiple entries (called "stages") >> Âfor a given pathname. ÂThese stages are used to hold the various >> Âunmerged version of a file when a merge is in progress. >> >> +Performance concerns >> +-------------------- >> + >> +Git is written with performance in mind and it works extremely well >> +with its typical repositories (i.e. source code repositories, with >> +a moderate number of small text files, possibly with long history). >> +Non-typical repositories (huge number of files, or very large >> +files...) may experience performance degradation. This section describes Probably should have written "experience mild performance degradation" >> +how Git behaves in such repositories and how to reduce impact. > > How huge is "huge" and how large is "large". From previous threads on > this list I'm guessing "large" is files bigger than physical RAM. I've A significant portion of RAM is enough to start swapping. There's also a hard limit imposed by mmap(): a file cannot be larger than available address space (2-3G on x86, probably larger on x86_64). > not really run into a situation where a huge number of files causes > performance problems. gentoo-x86 has ~100k files. Cold cache time is definitely long. Even with hot cache, a full cache refresh may take, I don't remember, half a second or so. It depends on many factors. I don't think I can draw a clear limit. > > Maybe there should be a distinction of where a user might see > performance problems e.g. initial clone, subsequent fetches, commit, > checkout or diff. > >> + >> +For repositories with really long history, you may want to work on >> +a shallow clone of it (see linkgit:git-clone[1], option '--depth'). >> +A shallow repository does not contain full history, so it may consume >> +less disk space and network bandwidth. On the other hand, you cannot >> +fetch from it. And obviously you cannot look further back than what >> +it has in history (you can deepen history though). > > You might want to mention git clone --reference and the > .git/objects/info/alternates for those concerned with disk usage. Thanks > >> + >> +For repositories with a large number of files, but you only need >> +a few of them present in working tree, you can use sparse checkout >> +(see linkgit:git-read-tree[1], section 'Sparse checkout'). Sparse >> +checkout can be used with either a normal repository, or a shallow >> +one. >> + >> +Git uses lstat(3) to detect changes in working tree. A huge number >> +of lstat(3) calls may impact performance, especially on systems with >> +slow lstat(3). In some cases you can reduce the number of lstat(3) >> +calls by specifying what directories you are interested in, so no >> +lstat(3) outside is needed. >> + >> +For repositories with a large number of files, you need all of them >> +present in working tree, but you know in advance only a few of them >> +may be modified, please consider using assume-unchanged bit (see >> +linkgit:git-update-index[1]). This helps reduce the number of lstat(3) >> +calls. >> + >> +Some Git commands need entire file content in memory to process. >> +You may want to avoid using them if possible on large files. Those >> +commands include: >> + >> +* All checkout commands (linkgit:git-checkout[1], >> + Âlinkgit:git-checkout-index[1], linkgit:git-read-tree[1], >> + Âlinkgit:git-clone[1]...) >> +* All diff-related commands (linkgit:git-diff[1], >> + Âlinkgit:git-log[1] with diff, linkgit:git-show[1] on commits...) >> +* All commands that need file conversion processing >> + > > This addresses one of my comments above. It might be worth talking about > using git bundles as an alternative to cloning over unreliable connections. Thanks. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html