Re: [RFC/PATCH] Added a remote helper to interact with mediawiki, pull & clone handled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2011/6/2 Jeff King <peff@xxxxxxxx>:
> On Thu, Jun 02, 2011 at 11:28:31AM +0200, Arnaud Lacurie wrote:
>
>> +sub mw_import {
>> [...]
>> +             # Get 500 revisions at a time due to the mediawiki api limit
>> +             while (1) {
>> +                     my $result = $mediawiki->api($query);
>> +
>> +                     # Parse each of those 500 revisions
>> +                     foreach my $revision (@{$result->{query}->{pages}->{$id}->{revisions}}) {
>> +                             my $page_rev_ids;
>> +                             $page_rev_ids->{pageid} = $page->{pageid};
>> +                             $page_rev_ids->{revid} = $revision->{revid};
>> +                             push (@revisions, $page_rev_ids);
>> +                             $revnum++;
>> +                     }
>> +                     last unless $result->{'query-continue'};
>> +                     $query->{rvstartid} = $result->{'query-continue'}->{revisions}->{rvstartid};
>> +                     print "\n";
>> +             }
>
> What is this newline at the end here for? With it, my import reliably
> fails with:
>
>  fatal: Unsupported command:
>  fast-import: dumping crash report to .git/fast_import_crash_6091
>
> Removing it seems to make things work.

 Yes we actually found it today. It slipped as we've never fetched
pages with more than 500 revisions since it got there...

>> +             # mediawiki revision number in the git note
>> +             my $note_comment = encode_utf8("note added by git-mediawiki");
>> +             my $note_comment_length = bytes::length($note_comment);
>> +             my $note_content = encode_utf8("mediawiki_revision: " . $pagerevids->{revid} . "\n");
>> +             my $note_content_length = bytes::length($note_content);
>> +
>> +             if ($fetch_from == 1 && $n == 1) {
>> +                     print "reset refs/notes/commits\n";
>> +             }
>> +             print "commit refs/notes/commits\n";
>
> Should these go in refs/notes/commits? I don't think we have a "best
> practices" yet for the notes namespaces, as it is still a relatively new
> concept. But I always thought "refs/notes/commits" would be for the
> user's "regular" notes, and that programmatic things would get their own
> notes, like "refs/notes/mediawiki".
>
That's a good idea, we didn't think notes could actually not go in
refs/notes/commits. This will be perfect to distinguish the user notes
from ours.
>
>> +             } else {
>> +                     print STDERR "You appear to have cloned an empty mediawiki\n";
>> +                     #What do we have to do here ? If nothing is done, an error is thrown saying that
>> +                     #HEAD is refering to unknown object 0000000000000000000
>> +             }
>
> Hmm. We do allow cloning empty git repos. It might be nice for there to
> be some way for a remote helper to signal "everything OK, but the result
> is empty". But I think that is probably something that needs to be added
> to the remote-helper protocol, and so is outside the scope of your
> script (maybe it is as simple as interpreting the null sha1 as "empty";
> I dunno).
>

Yes, that's a problem we've been running into. We didn't really know
how to solve it.

> Overall, it's looking pretty good. I like that I can resume a
> half-finished import via "git fetch". Though I do have one complaint:
> running "git fetch" fetches the metainfo for every revision of every
> page, just as it does for an initial clone. Is there something in the
> mediawiki API to say "show me revisions since N" (where N would be the
> mediawiki revision of the tip of what we imported)?

I am not sure I understand your question. Because actually, we are
supporting this,
thanks to git notes. Like when you git fetch after a clone, it checks
only the last revisions

Thank you very much for your help !

Arnaud Lacurie
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]