On Wed, Sep 13, 2006 at 10:30:17PM -0400, Shawn Pearce wrote: > I don't know exactly how big it is but the Gentoo CVS repository > is also considered to be very large (about the size of the Mozilla > repository) and just as difficult to import. Its either crashed or > taken about a month to process with the current Git CVS->Git tools. Ah, thanks for the tip. > Since I know that the bulk of the Gentoo CVS repository is the > portage tree I did a quick find|wc -l in my /usr/portage; its about > 124,500 files. > > Its interesting that Gentoo has almost as large of a repository given > that its such a young project, compared to NetBSD and Mozilla. :-) Portage uses files and thus CVS very differently, though. Each ebuild for each package revision of each version of a third-party package (like, say, monotone 0.28 and 0.29, and -r1, -r2 pkg bumps of those if they were needed) is its own file that's added, maybe edited a couple of times, and then deleted again later as new versions are added and older ones retired. These are copies and renames in the workspace, but are invisible to CVS. This uses up lots more files than a single long-lived build that gets edited each time; the Attic dirs must have huge numbers of files, way beyond the number that are live now. This lets portage keep builds around in a HEAD checkout for multiple versions at once, tagged internally with different statuses. Effectively, these tags take the place of VCS-based branches and releases, and are more flexible for end users tracking their favourite applications while keeping the rest of their system stable. If they had a VCS that supported file cloning and/or renaming, and used that to follow history between these ebuild files, things would be very different. There are some interesting use cases for VCS tools in supporting this behaviour nicely, too. -- Dan.
Attachment:
pgpV4qIarhboq.pgp
Description: PGP signature