Sebastian Bober <sbober@xxxxxxxxxxxxx> wrote: > The question would be, how the commits and the trees are laid out. > If every wiki revision shall be a git commit, then we'd need to handle > 300M commits. And we have 19M wiki pages (that would be files). The tree > objects would be very large and git-fast-import would crawl. > > Some tests with the german wikipedia have shown that importing the blobs > is doable on normal hardware. Getting the trees and commits into git > was not possible up to now, as fast-import was just to slow (and getting > slower after 1M commits). Well, to be fair to fast-import, its tree handling code is linear scan based, because that's how any other part of Git handles trees. If you just toss all 19M wiki pages into a single top level tree, that's going to take a very long time to locate the wiki page talking about Zoos. > I had the idea of having an importer that would just handle this special > case (1 file change per commit), but didn't get around to try that yet. Really, fast-import should be able to handle this well, assuming you aren't just tossing all 19M files into a single massive directory and hoping for the best. Because *any* program working on that sort of layout will need to spit out the 19M entry tree object on each and every commit, just so it can compute the SHA-1 checksum to get the tree name for the commit. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html