Hi, This is a first in my series of mails over the next few days, on issues that we've run into planning a potential migration for Gentoo's repository into Git. Our full repository conversion is large, even after tuning the repacking, the packed repository is between 1.4 and 1.6GiB. As of Feburary 4th, 2009, it contained 4886949 objects. It is not suitable for splitting into submodules either unfortunately - we have a lot of directory moves that would cause submodule bloat. During an initial clone, I see that git-upload-pack invokes pack-objects, despite the ENTIRE repository already being packed - no loose objects whatsoever. git-upload-pack then seems to buffer in memory. In a small repository, this wouldn't be a problem, as the entire repository can fit in memory very easily. However, with our large repository, git-upload-pack and git-pack-objects grows in memory to well more than the size of the packed repository, and are usually killed by the OOM. During 'remote: Counting objects: 4886949, done.', git-upload-pack peaks at 2474216KB VSZ and 1143048KB RSS. Shortly thereafter, we get 'remote: Compressing objects: 0% (1328/1994284)', git-pack-objects with ~2.8GB VSZ and ~1.8GB RSS. Here, the CPU burn also starts. On our test server machine (w/ git 1.6.0.6), it takes about 200 minutes walltime to finish the pack, IFF the OOM doesn't kick in. Given that the repo is entirely packed already, I see no point in doing this. For the initial clone, can the git-upload-pack algorithm please send existing packs, and only generate a pack containing the non-packed items? -- Robin Hugh Johnson Gentoo Linux Developer & Infra Guy E-Mail : robbat2@xxxxxxxxxx GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
Attachment:
pgpSifhB4ne8E.pgp
Description: PGP signature