Pavel Volek <Pavel.Volek@xxxxxxxxxxxxxxx> writes: > +sub get_mw_media_pages { > + mw_connect_maybe(); > + > + my %pages; # hash on page titles to avoid duplicates > + > + # get all pages for mediafiles (they are in different namespace) > + # only one namespace can be queried at the same moment > + my $mw_pages = $mediawiki->list({ > + action => 'query', > + list => 'allpages', > + apnamespace => get_mw_namespace_id("File"), > + aplimit => 500, > + }); This seems to be done unconditionally. Is this reasonable if the user has explicitely set remote.origin.pages or remote.origin.categories? Actually, shouldn't this be added to get_mw_pages, next to the code dealing with these two variables? Perhaps the function should be split into multiple functions, along the lines of: sub get_mw_pages { mw_connect_maybe(); my %pages; if (@tracked_pages) { $user_defined = 1; get_mw_tracked_pages(\%pages); } if (@tracked_categories) { $user_defined = 1; get_mw_tracked_categories(\%pages); } if (!$user_defined) { get_mw_all_pages(\%pages); } return values(%pages); } And your code would need to take these 3 options into account. > +sub get_all_mw_pages() { > + my @pages = get_mw_pages(); > + my @media_pages = get_mw_media_pages(); > + push(@pages,@media_pages); Space after comma. > +# Returns MediaWiki id for a canonical namespace name. Ex.: "File", "Project". > +sub get_mw_namespace_id() { > + mw_connect_maybe(); > + my $name = shift; > + my $query = { > + action => 'query', > + meta => 'siteinfo', > + siprop => 'namespaces', > + }; > + my $result = $mediawiki->api($query); It may make sense to cache the result, to avoid querying the API multiple times if you call the function more than once. We can even cache this in a configuration variable as the namespace identifiers are unlikely to change for a given wiki. > + if (!defined($file)){ Space between ) and { please. > + my @prefix = split (":",$page_title); Space after , please. > + if ($prefix[0] eq "File" || $prefix[0] eq "Image") { > + # check if there is a corresponding mediafile with the same timestamp => it is page > + # for new verion of the file (not only for new version of the description of the file) > + # => download corresponding file version Don't make long lines like this. In general, we avoid lines longer than 80 characters (or even a bit less), these are >100 and the following are worse. Long lines are usually an indication that you did not structure your code into functions, and this diagnosis seems to apply here. > + my ($imageid,$imageinfo) = each ( %{$result->{query}->{pages}} ); Space after ",". -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html