This is fantastic. I really appreciate all the answers. And it's great that I think I've sparked some general discussion that could lead somewhere too. Notes: I'm currently using 2.1.3. I'll move to 2.3.x I'm experimenting with git-annex to reduce repo size on disk. We'll see. I could remove all tags older than /n/ years old in the active repo and just maintain them in the historical repo. (We have quite a lot of CI-generated tags.) It sounds like that might improve performance. Questions: 1. Ævar : I'm a bit concerned by your statement that git rebases take about 1-2 s per commit. Does that mean that a "git pull --rebase", if it is picking up say 120 commits (not at all unrealistic), could potentially take 4 minutes to complete? Or have I misinterpreted your comment. 2. I'd not heard about bitmap indexes before this thread but it sounds like they should help me. In limited searching I can't find much useful documentation about them. It is also not clear to me if I have to explicitly run "git repack --write-bitmap-indexes" or if git will automatically detect when they're needed; first experiments seem to indicate that I need to explicitly generate them. I assume that once the index is there, git will just use it automatically. Steve On Thu, Feb 19, 2015 at 7:03 PM, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, Feb 19, 2015 at 04:26:58PM -0500, Stephen Morton wrote: >> I posted this to comp.version-control.git.user and didn't get any response. I >> think the question is plumbing-related enough that I can ask it here. >> >> I'm evaluating the feasibility of moving my team from SVN to git. We have a very >> large repo. [1] We will have a central repo using GitLab (or similar) that >> everybody works with. Forks, code sharing, pull requests etc. will be done >> through this central server. >> >> By 'performance', I guess I mean speed of day to day operations for devs. >> >> * (Obviously, trivially, a (non-local) clone will be slow with a large repo.) >> * Will a few simultaneous clones from the central server also slow down >> other concurrent operations for other users? > > This hasn't been a problem for us at $DAYJOB. Git doesn't lock anything > on fetches, so each process is independent. We probably have about > sixty developers (and maybe twenty other occasional users) that manage > to interact with our Git server all day long. We also have probably > twenty smoker (CI) systems pulling at two hour intervals, or, when > there's nothing to do, every two minutes, plus probably fifteen to > twenty build systems pulling hourly. > > I assume you will provide adequate resources for your server. > >> * Will 'git pull' be slow? >> * 'git push'? > > The most pathological case I've seen for git push is a branch with a > single commit merged into the main development branch. As of Git 2.3.0, > the performance regression here is fixed. > > Obviously, the speed of your network connection will affect this. Even > at 30 MB/s, cloning several gigabytes of data takes time. Git tries > hard to eliminate sending a lot of data, so if your developers keep > reasonably up-to-date, the cost of establishing the connection will tend > to dominate. > > I see pull and push times that are less than 2 seconds in most cases. > >> * 'git commit'? (It is listed as slow in reference [3].) >> * 'git stautus'? (Slow again in reference 3 though I don't see it.) > > These can be slow with slow disks or over remote file systems. I > recommend not doing that. I've heard rumbles that disk performance is > better on Unix, but I don't use Windows so I can't say. > > You should keep your .gitignore files up-to-date to avoid enumerating > untracked files. There's some work towards making this less of an > issue. > > git blame can be somewhat slow, but it's not something I use more than > about once a day, so it doesn't bother me that much. > >> Assuming I can put lots of resources into a central server with lots of CPU, >> RAM, fast SSD, fast networking, what aspects of the repo are most likely to >> affect devs' experience? >> * Number of commits >> * Sheer disk space occupied by the repo > > The number of files can impact performance due to the number of stat()s > required. > >> * Number of tags. >> * Number of branches. > > The number of tags and branches individually is really less relevant > than the total number of refs (tags, branches, remote branches, etc). > Very large numbers of refs can impact performance on pushes and pulls > due to the need to enumerate them all. > >> * Binary objects in the repo that cause it to bloat in size [1] >> * Other factors? > > If you want good performance, I'd recommend the latest version of Git > both client- and server-side. Newer versions of Git provide pack > bitmaps, which can dramatically speed up clones and fetches, and Git > 2.3.0 fixes a performance regression with large numbers of refs in > non-shallow repositories. > > It is totally worth it to roll your own packages of git if your vendor > provides old versions. > >> Of the various HW items listed above --CPU speed, number of cores, RAM, SSD, >> networking-- which is most critical here? > > I generally find that having a good disk cache is important with large > repositories. It may be advantageous to make sure the developer > machines have adequate memory. Performance is notably better on > development machines (VMs) with 2 GB or 4 GB of memory instead of 1 GB. > > I can't speak to the server side, as I'm not directly involved with its > deployment. > >> Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB repo, >> 50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 "fix-up >> branches" which are just one little dangling commit required to change the code >> a little bit between a commit a tag that was not quite made from it.) > > I routinely work on a repo that's 1.9 GB packed, with 25k (and rapidly > growing) refs. Other developers work on a repo that's 9 GB packed, with > somewhat fewer refs. We don't tend to have problems with this. > > Obviously, performance is better on some of our smaller repos, but it's > not unacceptable on the larger ones. I generally find that the 940 KB > repo with huge numbers of files performs worse than the 1.9 GB repo with > somewhat fewer. If you can split your repository into multiple logical > repositories, that will certainly improve performance. > > If you end up having pain points, we're certainly interested in > working through those. I've brought up performance problems and people > are generally responsive. > -- > brian m. carlson / brian with sandals: Houston, Texas, US > +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only > OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html