On Wed, 3 Jan 2007, Chris Lee wrote: > > So I'm using git 1.4.1, and I have been experimenting with importing > the KDE sources from Subversion using git-svnimport. As one single _huge_ import? All the sub-projects together? I have to say, that sounds pretty horrid. > First issue I ran into: On a machine with 4GB of RAM, when I tried to > do a full import, git-svnimport died after 309906 revisions, saying > that it couldn't fork. > > Checking `top` and `ps` revealed that there were no git-svnimport > processes doing anything, but all of my 4G of RAM was still marked as > used by the kernel. I had to do sysctl -w vm.drop_caches=3 to get it > to free all the RAM that the svn import had used up. I think that was just all cached, and all ok. The reason you didn't see any git-svnimport was that it had died off already, and all your memory was just caches. You could just have left it alone, and the kernel would have started re-using the memory for other things even without any "drop_caches". But what you did there didn't make anything worse, it was just likely had no real impact. However, it does sound like git-svnimport probably acts like git-cvsimport used to, and just keeps too much in memory - so it's never going to act really nicely.. It also looks like git-svnimport never repacks the repo, which is absolutely horrible for performance on all levels. The CVS importer repacks every one thousand commits or something like that. > Now, after that, I tried doing `git-repack -a` because I wanted to see > how small the packed archive would be (before trying to continue > importing the rest of the revisions. There are at least another 100k > revisions that I should be able to import, eventually.) I suspect you'd have been better off just re-starting, and using something like while : do git svnimport -l 1000 <...> .. figure out some way to decide if it's all done .. git repack -d done which would make svnimport act a bit more sanely, and repack incrementally. That should make both the import much faster, _and_ avoid any insane big repack at the end (well, you'd still want to do a "git repack -a -d" at the end to turn the many smaller packs into a bigger one, but it would be nicer). However, I don't know what the proper magic is for svnimport to do that sane "do it in chunks and tell when you're all done". Or even better - to just make it repack properly and not keep everything in memory. > The repack finished after about nine hours, but when I try to do a > git-verify-pack on it, it dies with this error message: > > error: Packfile > .git/objects/pack/pack-540263fe66ab9398cc796f000d52531a5c6f3df3.pack > SHA1 mismatch with itself That sounds suspiciously like the bug we had in out POWER sha1 implementation that would generate the wrong SHA1 for any pack-file that was over 512MB in size, due to an overflow in 32 bits (SHA1 does some counting in _bits_, so 512MB is 4G _bits_), Now, I assume you're not on POWER (and we fixed that bug anyway - and I think long before 1.4.1 too), but I could easily imagine the same bug in some other SHA1 implementation (or perhaps _another_ overflow at the 1GB or 2GB mark..). I assume that the pack-file you had was something horrid.. I hope this is with a 64-bit kernel and a 64-bit user space? That should limit _some_ of the issues. But I would still not be surprised if your SHA1 libraries had some 32-bit ("unsigned int") or 31-bit ("int") limits in them somewhere - very few people do SHA1's over huge areas, and even when you do SHA1 on something like a DVD image (which is easily over any 4GB limit), that tends to be done as many smaller calls to the SHA1 library routines. Junio - I suspect "pack-check.c" really shouldn't try to do it as one single humungous "SHA1_Update()" call. It showed one bug on PPC, I wouldn't be surprised if it's implicated now on some other architecture. Shawn - does the pack-file-windowing thing already change that? I'm too lazy to check.. As to who knows how to fix git-svnimport to do something saner, I have no clue.. Sasha seems to have touched it last. Sasha? Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html