On Sun, Apr 05, 2009 at 02:05:36AM +0200, Nicolas Sebrecht wrote: > > Our full repository conversion is large, even after tuning the > > repacking, the packed repository is between 1.4 and 1.6GiB. As of Feburary > > 4th, 2009, it contained 4886949 objects. It is not suitable for > > splitting into submodules either unfortunately - we have a lot of > > directory moves that would cause submodule bloat. > Actually, I'm not sure that a full portage tree repository would be the > best thing to do. It would not be suitable in the long term and working > on the repository/history would be a big mess. Why provide a such repo ? > Or at least, why provide a such readable repo ? > > IMHO, you should provide a repository per upstream package on the main > server. That causes incredibly bloat unfortunately. I'll summarize why here for the git mailing list. Most our developers have the entire tree checked out, and in informal surveys, would like to continue to do so. There are ~13500 packages right now (I'm excluding eclasses/, profiles/, scripts/), and growing by 15-25 new packages/week. (~45% of packages also have a files/ directory). For each package, the .git directory, assuming in a single pack, consumes at least 36 inodes. Tail-packing is limited to Reiserfs3 and JFS, and isn't widely used other than that, so assuming 4KiB inodes, that's an overhead of at least 144KiB per package. Multiple by the number of packages, and we get an overhead of 2GiB, before we've added ANY content. Without tail packing, the Gentoo tree is presently around 520MiB (you can fit it into ~190MiB with tail packing). This means that repo-per-package would have an overhead in the range of 400%. Additionally, there's a lot of commonality between ebuilds and packages, and having repo-per-package means that the compression algorithms can't make use of it - dictionary algorithms are effective at compression for a reason. Overhead is the reason that we refused to migrate to SVN as well. - CVS, per each directory of data, has a constant overhead of 4 inodes (CVS/ CVS/Root CVS/Repository CVS/Entries) - SVN, for each data directory, has another complete copy of the data, plus a minimum of 10 other inodes. - Git costs a minimum 36 inodes per repository. In a fully packed repo, the number of inodes tends to stay below 50 in all cases. > PS: what about cc'ing gentoo-scm list ? It's not an open-posting list, so anybody here on the git list simply replying would not get their post on there. The issue has been raised there, and this mainly meant to find a resolution to that problem. -- Robin Hugh Johnson Gentoo Linux Developer & Infra Guy E-Mail : robbat2@xxxxxxxxxx GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
Attachment:
pgpdbOSeiPYTx.pgp
Description: PGP signature