On Mon, 12 Jun 2006, Jon Smirl wrote: > > The svn repository was built by cvs2svn, none of the git tools were involved. Ok, so that part is purely a SVN issue. Having that many files in a single directory (or two) is a total disaster. That said, it works well enough if you don't create new files very often (and _preferably_ don't look them up either, although that is effectively helped by indexing). I _suspect_ that - the "cvs->svn" import process was probably optimized so that it did one file at a time (your "eight stages" description certainly sounds as if it could do it), and in that case it's entirely possible that that can be done efficiently (ie you still do file creates and lookups in an increasingly big directory, but you do it only _once_ per file, rather than look up old files all the time). So your lookup ratio would be 1:1 with the files. Doing a git-cvsimport would then do basically random lookups in that _huge_ directory, and instead of reading the files one at a time (and fully) and never again, I assume it opens them, reads one revision, closes it, and then goes on to the next revision, so it will have a much higher lookup ratio (you'd look up every file several times). - I suspect the SVN people must be hurting for performance themselves. I guess they don't expect to be able to do 5-10 commits per second, the way git was designed to do. So they optimized the cvs import part, but their actual regular live usage is probably hitting this same directory inefficiency. Of course, the old SVN Berkeley DB usage was probably even worse (not in system time, but I'd expect the access patterns within the BDB file to be pretty nasty, and probably a lot of user time spent seeking around it). But in this particular case, it might even have been better. Maybe we could teach the SVN people about pack-files? ;) Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html