Re: [RFC/PATCH] Added a remote helper to interact with mediawiki, pull & clone handled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 02, 2011 at 11:28:31AM +0200, Arnaud Lacurie wrote:

> +sub mw_import {
> [...]
> +		# Get 500 revisions at a time due to the mediawiki api limit
> +		while (1) {
> +			my $result = $mediawiki->api($query);
> +
> +			# Parse each of those 500 revisions
> +			foreach my $revision (@{$result->{query}->{pages}->{$id}->{revisions}}) {
> +				my $page_rev_ids;
> +				$page_rev_ids->{pageid} = $page->{pageid};
> +				$page_rev_ids->{revid} = $revision->{revid};
> +				push (@revisions, $page_rev_ids);
> +				$revnum++;
> +			}
> +			last unless $result->{'query-continue'};
> +			$query->{rvstartid} = $result->{'query-continue'}->{revisions}->{rvstartid};
> +			print "\n";
> +		}

What is this newline at the end here for? With it, my import reliably
fails with:

  fatal: Unsupported command: 
  fast-import: dumping crash report to .git/fast_import_crash_6091

Removing it seems to make things work.

> +		my $user = $rev->{user} || 'Anonymous';
> +		my $dt = DateTime::Format::ISO8601->parse_datetime($rev->{timestamp});
> +
> +		my $comment = defined $rev->{comment} ? $rev->{comment} : '*Empty MediaWiki Message*';

In importing the git wiki, I ran into an empty timestamp. This throws an
exception which kills the whole import:

  $ git clone mediawiki::https://git.wiki.kernel.org/ git-wiki
  2821/7949: Revision nÂ4210 of GitSurvey
  Invalid date format:  at /home/peff/compile/git/contrib/mw-to-git/git-remote-mediawiki line 195
          main::mw_import('https://git.wiki.kernel.org/') called at /home/peff/compile/git/contrib/mw-to-git/git-remote-mediawiki line 42

At the very least, we should intercept this and put in some placeholder
timestamp. I'm not sure what the best placeholder would be. Maybe use
the date from the previous revision, plus one second? Or maybe there is
some other bug causing us to have an empty timestamp. I didn't dig
deeper yet.

> +		# mediawiki revision number in the git note
> +		my $note_comment = encode_utf8("note added by git-mediawiki");
> +		my $note_comment_length = bytes::length($note_comment);
> +		my $note_content = encode_utf8("mediawiki_revision: " . $pagerevids->{revid} . "\n");
> +		my $note_content_length = bytes::length($note_content);
> +
> +		if ($fetch_from == 1 && $n == 1) {
> +			print "reset refs/notes/commits\n";
> +		}
> +		print "commit refs/notes/commits\n";

Should these go in refs/notes/commits? I don't think we have a "best
practices" yet for the notes namespaces, as it is still a relatively new
concept. But I always thought "refs/notes/commits" would be for the
user's "regular" notes, and that programmatic things would get their own
notes, like "refs/notes/mediawiki".

That wouldn't show them by default, but you could do:

  git log --notes=mediawiki

to see them (and maybe that is a feature, because most of the time you
won't care about the mediawiki revision).

> +		} else {
> +			print STDERR "You appear to have cloned an empty mediawiki\n";
> +			#What do we have to do here ? If nothing is done, an error is thrown saying that
> +			#HEAD is refering to unknown object 0000000000000000000
> +		}

Hmm. We do allow cloning empty git repos. It might be nice for there to
be some way for a remote helper to signal "everything OK, but the result
is empty". But I think that is probably something that needs to be added
to the remote-helper protocol, and so is outside the scope of your
script (maybe it is as simple as interpreting the null sha1 as "empty";
I dunno).

Overall, it's looking pretty good. I like that I can resume a
half-finished import via "git fetch". Though I do have one complaint:
running "git fetch" fetches the metainfo for every revision of every
page, just as it does for an initial clone. Is there something in the
mediawiki API to say "show me revisions since N" (where N would be the
mediawiki revision of the tip of what we imported)?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]