On 1/4/07, Linus Torvalds <torvalds@xxxxxxxx> wrote:
Well, the good news is, I think we could probably split it up from within git. It's not fundamentally hard, although it is pretty damn expensive (and it would require the subproject support to do really well).
I was hoping that'd be possible at some point. I really want to split the submodules back out into first-class modules - one of my biggest misgivings about the current KDE repository setup is how everything is part of one gigantic repository.
So ignore that issue for now. I'd love to see the end result, if only because it sounds like you have a test-case for git that is four times bigger than the mozilla archive - even if it's just because of some really really stupid design decisions from the KDE SVN maintainers ;)
The full on-disk size of the KDE SVN repo is about 37GB, last time I checked. It may be up to 38 or 39GB now - I last ran rsync against the svn repo a few weeks ago. I'm only focusing on importing the first 409k revisions at the moment, because that comprises the commits that originally came from CVS and were imported into SVN. Almost immediately after the CVS import, coolo made some changes - moving all of the core KDE modules into /trunk/KDE, and their branches and tags into /branches/KDE and /tags/KDE respectively. This, I suspect will end up making things "fun" for the other part of the import, which is another 200k revisions, give or take. So, yes, I suspect it's quite a bit larger than Mozilla. I'm doing the conversion to git as a test so that I can show some numbers to the KDE guys; I'm not trying to campaign for a transition to git, but I think it's definitely worth exploring what such a world would look like. But in order for me to try to make a compelling argument for an eventual project move to git, the git win32 support would need to be really good. (In KDE4, we're supporting Windows and OS X as well as X11 as first-class platforms.)
(But I would actually expect that KDE SVN uses SVN subprojects, so hopefully it's not _really_ one big repository. Of course, I don't know if SVN really does subprojects or how well it does them, so that's just a total guess).
I don't think so, but I'll ask coolo (the KDE SVN administrator).
The real problem with a SVN import is that I think SVN doesn't do merges right, so you can't import merge history properly (well, you can, if you decide that "properly" really means "SVN can't merge, so we can't really show it as merges in git either"). I think both git-svn and git-svnimport can _guess_ about merges, but it's just a heuristic, afaik. Whether it's a good one, I don't know.
Not too worried about the merges right now - as long as I have a rough approximation of what the original looked like, I'm pretty happy.
> Yeah. I haven't bothered hacking git-svnimport yet - but it looks like > having it automatically repack every thousand revisions or so would > probably be a pretty big win. That, or making it use the same "fastimport" that the hacked-up CVS importer was made to use. Either way, somebody who understands SVN intimately (and probably perl) would need to work on it. That would not be me, so I can't really help ;)
Well, Shawn pointed me at the fastimport stuff, and I happen to know Perl reasonably well (I think) so I'll take a stab at trying it that way.
> By default, if I had, say, one pack with the first 1000 revisions, and > I imported another 1000, running 'git-repack' on its own would leave > the first pack alone and create a new pack with just the second 1000 > revisions, right? Yes. It's _probably_ better to do a full re-pack every once in a while (because if you have a lot of pack-files, eventually that ends up being problematic too), but as a first approximation, it's probably fine to just do a plain "git repack" every thousand commits, and then do a full big repack at the end.
Sounds like a good idea. Also sounds like it would be much less painful than the current situation, where it takes over nine hours to pack up all these revisions. :)
The big repack will still be pretty expensive, but it should be less painful than having everything unpacked. And at least the import won't have run with millions and millions of loose objects. So doing a "git repack -a -d" at the end is a good idea, and _maybe_ it could be done in the middle too for really big packs.
Okay, good to know.
Again, doing what fastimport does avoids most of the whole issue, since it just generates a pack up-front instead. But that requires the importer to specifically understand about that kind of setup.
I'll definitely be investigating the fastimport option. Looks like I'll get to crack open some of my Perl books - haven't had to do that in a while. :) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html