Re: Git-Mediawiki : Question about Jeff King's import script

Jeff King <peff@xxxxxxxx> · Thu, 26 May 2011 11:42:14 -0400

On Thu, May 26, 2011 at 05:18:11PM +0200, Claire Fousse wrote:

> We based our script on what you called a few months ago the "quick and
> dirty perl script" for the import part and have a few questions about
> it.
> First of all, just in case, here is your original script :
> http://article.gmane.org/gmane.comp.version-control.git/167560
> 
> It seems like you first used a hashmap for it to be transformed later
> into a flat list / array. What is the reasoning behind this ? Why not
> create an array right away ?

The hashmap is actually backed by an on-disk key/value database.  The
purpose of this was to allow resuming an import that had failed in the
middle (since even for a moderate-sized wiki like the git wiki, the
import was quite slow).

So the hashmap is indexed by page id, and each value contains an array
of revisions for that page. If we see a page id that we've already done,
we can skip importing it.

If you wanted to do it all at once, yes, you could build a flat array of
revisions, with each revision mentioning the page that it came from, and
just keep appending to the array as you read more data from the wiki.
And then at the end, sort that array based on timestamp to get the
chronological ordering of changes.

Hope that helps,
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html