On Thu, May 26, 2011 at 05:18:11PM +0200, Claire Fousse wrote: > We based our script on what you called a few months ago the "quick and > dirty perl script" for the import part and have a few questions about > it. > First of all, just in case, here is your original script : > http://article.gmane.org/gmane.comp.version-control.git/167560 > > It seems like you first used a hashmap for it to be transformed later > into a flat list / array. What is the reasoning behind this ? Why not > create an array right away ? The hashmap is actually backed by an on-disk key/value database. The purpose of this was to allow resuming an import that had failed in the middle (since even for a moderate-sized wiki like the git wiki, the import was quite slow). So the hashmap is indexed by page id, and each value contains an array of revisions for that page. If we see a page id that we've already done, we can skip importing it. If you wanted to do it all at once, yes, you could build a flat array of revisions, with each revision mentioning the page that it came from, and just keep appending to the array as you read more data from the wiki. And then at the end, sort that array based on timestamp to get the chronological ordering of changes. Hope that helps, -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html