Re: [PATCH/RFC] file import functionality for git-remote-mw

Matthieu Moy <Matthieu.Moy@xxxxxxxxxxxxxxx> · Mon, 04 Jun 2012 22:54:11 +0200

Pavel Volek <Pavel.Volek@xxxxxxxxxxxxxxx> writes:

> +sub get_mw_media_pages {
> +	mw_connect_maybe();
> +
> +	my %pages; # hash on page titles to avoid duplicates
> +
> +	# get all pages for mediafiles (they are in different namespace)
> +	# only one namespace can be queried at the same moment
> +	my $mw_pages = $mediawiki->list({
> +		action => 'query',
> +		list => 'allpages',
> +		apnamespace => get_mw_namespace_id("File"),
> +		aplimit => 500,
> +	});

This seems to be done unconditionally. Is this reasonable if the user
has explicitely set remote.origin.pages or remote.origin.categories?

Actually, shouldn't this be added to get_mw_pages, next to the code
dealing with these two variables? Perhaps the function should be
split into multiple functions, along the lines of:

sub get_mw_pages {
	mw_connect_maybe();

        my %pages;
	if (@tracked_pages) {
		$user_defined = 1;
		get_mw_tracked_pages(\%pages);
	}
	if (@tracked_categories) {
		$user_defined = 1;
		get_mw_tracked_categories(\%pages);
	}
	if (!$user_defined) {
		get_mw_all_pages(\%pages);
	}
	return values(%pages);
}

And your code would need to take these 3 options into account.

> +sub get_all_mw_pages() {
> +	my @pages = get_mw_pages();
> +	my @media_pages = get_mw_media_pages();
> +	push(@pages,@media_pages);

Space after comma.

> +# Returns MediaWiki id for a canonical namespace name. Ex.: "File", "Project".
> +sub get_mw_namespace_id() {
> +	mw_connect_maybe();
> +	my $name = shift;
> +	my $query = {
> +		action => 'query',
> +		meta => 'siteinfo',
> +		siprop => 'namespaces',
> +	};
> +	my $result = $mediawiki->api($query);

It may make sense to cache the result, to avoid querying the API
multiple times if you call the function more than once. We can even
cache this in a configuration variable as the namespace identifiers are
unlikely to change for a given wiki.

> +	if (!defined($file)){

Space between ) and { please.

> +		my @prefix = split (":",$page_title);

Space after , please.

> +		if ($prefix[0] eq "File" || $prefix[0] eq "Image") {
> +			# check if there is a corresponding mediafile with the same timestamp => it is page
> +			# for new verion of the file (not only for new version of the description of the file)

> +			# => download corresponding file version

Don't make long lines like this. In general, we avoid lines longer than
80 characters (or even a bit less), these are >100 and the following are
worse.

Long lines are usually an indication that you did not structure your
code into functions, and this diagnosis seems to apply here.

> +			my ($imageid,$imageinfo) = each ( %{$result->{query}->{pages}} );

Space after ",".

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html