Re: [RFC/PATCH] Added a remote helper to interact with mediawiki, pull & clone handled

Arnaud Lacurie <arnaud.lacurie@xxxxxxxxxxxxxxx> · Thu, 2 Jun 2011 22:58:44 +0200

2011/6/2 Junio C Hamano <gitster@xxxxxxxxx>:
> Arnaud Lacurie <arnaud.lacurie@xxxxxxxxxxxxxxx> writes:
>
>>  contrib/mw-to-git/git-remote-mediawiki     |  252 ++++++++++++++++++++++++++++
>>  contrib/mw-to-git/git-remote-mediawiki.txt |    7 +
>>  2 files changed, 259 insertions(+), 0 deletions(-)
>
> It is pleasing to see that a half of a custom backend can be done in just
> 250 lines of code.  I understand that this is a work-in-progress with many
> unnecessary lines spitting debugging output to STDERR, whose removal will
> further shrink the code?

To give you an idea, the whole thing with the other part is about 400
lines for now.
There are a lot of output to STDERR that will be eventually removed.
We still have to
decide what "option verbosity" should display before removing to many of them.

>> +# commands parser
>> +my $loop = 1;
>> +my $entry;
>> +my @cmd;
>> +while ($loop) {
>
> This is somewhat unusual-looking loop control.
>
> Wouldn't "while (1) { ...; last if (...); if (...) { last; } }" do?

Ok for that loop control.

>> +     $| = 1; #flush STDOUT
>> +     $entry = <STDIN>;
>> +     print STDERR $entry;
>> +     chomp($entry);
>> +     @cmd = undef;
>> +     @cmd = split(/ /,$entry);
>> +     switch ($cmd[0]) {
>> +             case "capabilities" {
>> +                     if ($cmd[1] eq "") {
>> +                             mw_capabilities();
>> +                     } else {
>> +                            $loop = 0;
>
> I presume that this is "We were expecting to read capabilities command but
> found something unexpected; let's abort". Don't you want to say something
> to the user here, perhaps on STDERR?
>

Actually, we based that "no error messages displayed" policy on the
git-remote-http which aborts if something unexpected is found. We could add an
error message though.

>> +                     }
>> +             }
>> ...
>> +             case "option" {
>> +                     mw_option($cmd[1],$cmd[2]);
>> +             }
>
> No error checking only for this one?

There should be one. That's a miss.

>> +                     my @pushargs = split(/:/,$cmd[1]);
>> +                     if ($pushargs[1] ne "" && $pushargs[2] eq ""
>> +                     && (substr($pushargs[0],0,1) eq "+")) {
>> +                             mw_push(substr($pushargs[0],1),$pushargs[1]);
>> +                     } else {
>> +                            $loop = 0;
>> +                     }
>
> Is "push" always forcing?

The tests are done in the mw_push function which is still being implemented.
The push will abort if there is something to pull, just like git acts.

>> +sub mw_import {
>> +     my @wiki_name = split(/:\/\//,$url);
>> +     my $wiki_name = $wiki_name[1];
>> +
>> +     my $mediawiki = MediaWiki::API->new;
>> +     $mediawiki->{config}->{api_url} = "$url/api.php";
>> +
>> +     my $pages = $mediawiki->list({
>> +             action => 'query',
>> +             list => 'allpages',
>> +             aplimit => 500,
>> +     });
>> +     if ($pages == undef) {
>> +             print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
>> +             print STDERR "fatal: make sure '$url/api.php' is a valid page\n";
>> +             exit;
>> +     }
>> +
>> +     my @revisions;
>> +     print STDERR "Searching revisions...\n";
>> +     my $fetch_from = get_last_local_revision() + 1;
>> +     my $n = 1;
>> +     foreach my $page (@$pages) {
>> +             my $id = $page->{pageid};
>> +
>> +             print STDERR "$n/", scalar(@$pages), ": $page->{title}\n";
>> +             $n++;
>> +
>> +             my $query = {
>> +                     action => 'query',
>> +                     prop => 'revisions',
>> +                     rvprop => 'ids',
>> +                     rvdir => 'newer',
>> +                     rvstartid => $fetch_from,
>> +                     rvlimit => 500,
>> +                     pageids => $page->{pageid},
>> +             };
>> +
>> +             my $revnum = 0;
>> +             # Get 500 revisions at a time due to the mediawiki api limit
>
> It's nice that you can dig deeper with rvlimit increments. I wonder if
> 'allpages' also let's you retrieve more than 500 pages in total by somehow
> iterating over the set of pages.

Yes it is possible. And works with the removal of the "\n" peff
pointed out in the previous message.

>> +     # Creation of the fast-import stream
>> +     print STDERR "Fetching & writing export data...\n";
>> +     binmode STDOUT, ':binary';
>> +     $n = 0;
>> +
>> +     foreach my $pagerevids (sort {$a->{revid} <=> $b->{revid}} @revisions) {
>> +             #fetch the content of the pages
>> +             my $query = {
>> +                     action => 'query',
>> +                     prop => 'revisions',
>> +                     rvprop => 'content|timestamp|comment|user|ids',
>> +                     revids => $pagerevids->{revid},
>> +             };
>> +
>> +             my $result = $mediawiki->api($query);
>> +
>> +             my $rev = pop(@{$result->{query}->{pages}->{$pagerevids->{pageid}}->{revisions}});
>
> Is the list of per-page revisions guaranteed to be sorted (not a
> rhetorical question; just asking)?

Yes it is.

>> +             print "commit refs/mediawiki/$remotename/master\n";
>> +             print "mark :$n\n";
>> +             print "committer $user <$user\@$wiki_name> ", $dt->epoch, " +0000\n";
>> +             print "data ", bytes::length(encode_utf8($comment)), "\n", encode_utf8($comment);
>
> Calling encode_utf8() twice on the same data?  How big is this $comment
> typically?  Or does encode_utf8() somehow memoize?
>
>> +             # If it's not a clone, needs to know where to start from
>> +             if ($fetch_from != 1 && $n == 1) {
>> +                     print "from refs/mediawiki/$remotename/master^0\n";
>> +             }
>> +             print "M 644 inline $title.wiki\n";
>> +             print "data ", bytes::length(encode_utf8($content)), "\n", encode_utf8($content);
>
> Same for $content, which presumably is larger than $comment...
>
> Perhaps a small helper
>
>        sub literal_data {
>                my ($content) = @_;
>                print "data ", bytes::length($content), "\n", $content;
>        }
>
> would help here, above, and below where you create a "note" on this
> commit?
>
>> +             # mediawiki revision number in the git note
>> +             my $note_comment = encode_utf8("note added by git-mediawiki");
>> +             my $note_comment_length = bytes::length($note_comment);
>> +             my $note_content = encode_utf8("mediawiki_revision: " . $pagerevids->{revid} . "\n");
>> +             my $note_content_length = bytes::length($note_content);
>> +
>> +             if ($fetch_from == 1 && $n == 1) {
>> +                     print "reset refs/notes/commits\n";
>> +             }
>> +             print "commit refs/notes/commits\n";
>> +             print "committer $user <user\@example.com> ", $dt->epoch, " +0000\n";
>> +             print "data ", $note_comment_length, "\n", $note_comment;
>
> With that, this will become
>
>        literal_data(encode_utf8("note added by git-mediawiki"));
>
> and you don't need two extra variables.  Same for $note_content*.

Thank you very much for this helper, it'll help factoring the code and
reducing the number of variables.

Thank you very much for your help and comments on that project.

Regards

-- 
Arnaud Lacurie
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html