2011/6/2 Jeff King <peff@xxxxxxxx>: > On Thu, Jun 02, 2011 at 11:28:31AM +0200, Arnaud Lacurie wrote: > >> +sub mw_import { >> [...] >> + # Get 500 revisions at a time due to the mediawiki api limit >> + while (1) { >> + my $result = $mediawiki->api($query); >> + >> + # Parse each of those 500 revisions >> + foreach my $revision (@{$result->{query}->{pages}->{$id}->{revisions}}) { >> + my $page_rev_ids; >> + $page_rev_ids->{pageid} = $page->{pageid}; >> + $page_rev_ids->{revid} = $revision->{revid}; >> + push (@revisions, $page_rev_ids); >> + $revnum++; >> + } >> + last unless $result->{'query-continue'}; >> + $query->{rvstartid} = $result->{'query-continue'}->{revisions}->{rvstartid}; >> + print "\n"; >> + } > > What is this newline at the end here for? With it, my import reliably > fails with: > > fatal: Unsupported command: > fast-import: dumping crash report to .git/fast_import_crash_6091 > > Removing it seems to make things work. Yes we actually found it today. It slipped as we've never fetched pages with more than 500 revisions since it got there... >> + # mediawiki revision number in the git note >> + my $note_comment = encode_utf8("note added by git-mediawiki"); >> + my $note_comment_length = bytes::length($note_comment); >> + my $note_content = encode_utf8("mediawiki_revision: " . $pagerevids->{revid} . "\n"); >> + my $note_content_length = bytes::length($note_content); >> + >> + if ($fetch_from == 1 && $n == 1) { >> + print "reset refs/notes/commits\n"; >> + } >> + print "commit refs/notes/commits\n"; > > Should these go in refs/notes/commits? I don't think we have a "best > practices" yet for the notes namespaces, as it is still a relatively new > concept. But I always thought "refs/notes/commits" would be for the > user's "regular" notes, and that programmatic things would get their own > notes, like "refs/notes/mediawiki". > That's a good idea, we didn't think notes could actually not go in refs/notes/commits. This will be perfect to distinguish the user notes from ours. > >> + } else { >> + print STDERR "You appear to have cloned an empty mediawiki\n"; >> + #What do we have to do here ? If nothing is done, an error is thrown saying that >> + #HEAD is refering to unknown object 0000000000000000000 >> + } > > Hmm. We do allow cloning empty git repos. It might be nice for there to > be some way for a remote helper to signal "everything OK, but the result > is empty". But I think that is probably something that needs to be added > to the remote-helper protocol, and so is outside the scope of your > script (maybe it is as simple as interpreting the null sha1 as "empty"; > I dunno). > Yes, that's a problem we've been running into. We didn't really know how to solve it. > Overall, it's looking pretty good. I like that I can resume a > half-finished import via "git fetch". Though I do have one complaint: > running "git fetch" fetches the metainfo for every revision of every > page, just as it does for an initial clone. Is there something in the > mediawiki API to say "show me revisions since N" (where N would be the > mediawiki revision of the tip of what we imported)? I am not sure I understand your question. Because actually, we are supporting this, thanks to git notes. Like when you git fetch after a clone, it checks only the last revisions Thank you very much for your help ! Arnaud Lacurie -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html