On Thu, Jun 09, 2011 at 03:15:59PM +0200, Jeremie Nikaes wrote: > For now, the whole wiki is cloned, but it will be possible to clone only > some pages: the clone is based on a list of pages which is now all > pages. This is not really true anymore, is it? Later you say: > Partial cloning is supported using the following syntax : > "git clone mediawiki::http://wikiurl##A_Page##Another_Page" > As always, this url is kept in .git/config, helping to always keep > track of these specific pages so I think it is not "will be" possible any more. > +sub get_pages{ > + my $mediawiki = MediaWiki::API->new; > + $mediawiki->{config}->{api_url} = "$url/api.php"; > [...] > + } else { > + #the list of titles should follow the pattern 'page1|page2|...' > + my $titles = ""; > + foreach my $title (@pages_titles){ > + $titles.="$title|"; > + } > + #supress the last | that is add in the foreach > + chop($titles); This is usually spelled: my $titles = join('|', @pages_titles); > + $pages = $mediawiki->api({ > + action => 'query', > + titles => $titles, > + }); > + if (!defined($pages)) { > + print STDERR "fatal: None of the pages exist \n"; > + exit 1; > + } That's not an accurate error message. If the pages don't exist, we will actually get back a set of pages with negative ids (so you can tell which ones exist and which ones don't). If $pages is undefined, it's actually not a valid mediawiki repo. Also, according to the mediawiki API, we can send only 51 titles at a time. So we need to break this into pieces. However, I wonder if this code path is needed at all. We are mapping titles to page ids, so that we can later ask mediawiki for revisions by page id. But why not just ask for revisions by title, and skip this extra round trip to the server? Speaking of round trips, I did have an idea for reducing round-trips in the "mostly up to date" case. We can ask for the revisions for multiple titles at once (apparently up to 51, or 501 if you have special bot privileges), but you will only get the latest revision for each. Which isn't sufficient for us to do anything except tell whether or not there are any revisions to fetch. So without the optimization, with N pages we will make N requests for new revisions. But with it, we will make N/51 requests for the latest revisions, and then M (where M <= N) for every page that actually has new content. In other words, it is a good optimization as long as less than 49/51 pages have changed, on average. So it's bad for "clone", but very good for a subsequent "fetch". The best case is a fetch where nothing has changed, which should have 1/51th as many round-trips to determine that is the case. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html