Re: svn to git, N-squared?

Linus Torvalds <torvalds@xxxxxxxx> · Mon, 12 Jun 2006 10:08:05 -0700 (PDT)

On Mon, 12 Jun 2006, Jon Smirl wrote:
> 
> The svn repository was built by cvs2svn, none of the git tools were involved.

Ok, so that part is purely a SVN issue.

Having that many files in a single directory (or two) is a total disaster. 
That said, it works well enough if you don't create new files very often 
(and _preferably_ don't look them up either, although that is effectively 
helped by indexing). I _suspect_ that 

 - the "cvs->svn" import process was probably optimized so that it did one 
   file at a time (your "eight stages" description certainly sounds as if 
   it could do it), and in that case it's entirely possible that that can 
   be done efficiently (ie you still do file creates and lookups in an 
   increasingly big directory, but you do it only _once_ per file, rather 
   than look up old files all the time). So your lookup ratio would be 1:1 
   with the files.

   Doing a git-cvsimport would then do basically random lookups in that 
   _huge_ directory, and instead of reading the files one at a time (and 
   fully) and never again, I assume it opens them, reads one revision, 
   closes it, and then goes on to the next revision, so it will have a 
   much higher lookup ratio (you'd look up every file several times).

 - I suspect the SVN people must be hurting for performance themselves. I 
   guess they don't expect to be able to do 5-10 commits per second, the 
   way git was designed to do. So they optimized the cvs import part, but 
   their actual regular live usage is probably hitting this same directory 
   inefficiency.

Of course, the old SVN Berkeley DB usage was probably even worse (not in 
system time, but I'd expect the access patterns within the BDB file to be 
pretty nasty, and probably a lot of user time spent seeking around it). 
But in this particular case, it might even have been better.

Maybe we could teach the SVN people about pack-files? ;)

			Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html