Re: [PATCH v2 07/22] git-remote-mediawiki: Change style of some regular expressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 7, 2013 at 5:42 PM, Célestin Matte
<celestin.matte@xxxxxxxxxx> wrote:
> - Remove m modifier when useless (m// and // was used randomly; this makes the
> code more coherent)
> - Remove stringy split (split('c', ...) instead of split(/c/, ...))
> - Use {}{} instead of /// when slashes are used inside the regexp so as not to
> escape it.

When you split up 1/22 from v1 into distinct patches, review became
much easier. The same can be done here (IMHO). The above bullet points
are each conceptually distinct (even if they may happen to overlap
textually in some cases).

> A "split ' '" is turned into a "split / /", which changes its behaviour: the old
> method matched a run of whtespaces (/\s*/), while the new one will match a
> single whitespace, which is what we want here.

The phrase "which is what we want here" does not convey much. Taking a
hint from Junio's responses explaining why he finds this subtle change
acceptable, perhaps a paragraph like this might be appropriate:

  Note the special case split(' ') for which Perl splits on runs of
  whitespace, whereas split(/ /) splits on exactly one space.  In
  other contexts, changing split(' ') to split(/ /) could potentially
  be a regression, however, here, when parsing the output of
  "rev-list --parents", whose output SHA-1's are each separated by a
  single space, splitting on a single space is perfectly correct.

More comments below.

> Signed-off-by: Célestin Matte <celestin.matte@xxxxxxxxxx>
> Signed-off-by: Matthieu Moy <matthieu.moy@xxxxxxxxxxxxxxx>
> ---
>  contrib/mw-to-git/git-remote-mediawiki.perl |   20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl
> index a5c963b..482cd95 100755
> --- a/contrib/mw-to-git/git-remote-mediawiki.perl
> +++ b/contrib/mw-to-git/git-remote-mediawiki.perl
> @@ -121,7 +121,7 @@ chomp($dumb_push);
>  $dumb_push = ($dumb_push eq "true");
>
>  my $wiki_name = $url;
> -$wiki_name =~ s/[^\/]*:\/\///;
> +$wiki_name =~ s{[^/]*://}{};
>  # If URL is like http://user:password@xxxxxxxxxxx/, we clearly don't
>  # want the password in $wiki_name. While we're there, also remove user
>  # and '@' sign, to avoid author like MWUser@HTTPUser@xxxxxxxx
> @@ -565,7 +565,7 @@ sub mediawiki_smudge {
>
>  sub mediawiki_clean_filename {
>         my $filename = shift;
> -       $filename =~ s/@{[SLASH_REPLACEMENT]}/\//g;
> +       $filename =~ s{$SLASH_REPLACEMENT}{/}g;

The change from SLASH_REPLACEMENT to $SLASH_REPLACEMENT should have
happened in patch 2/22 where 'constant' was replaced with 'Readonly'.

>         # [, ], |, {, and } are forbidden by MediaWiki, even URL-encoded.
>         # Do a variant of URL-encoding, i.e. looks like URL-encoding,
>         # but with _ added to prevent MediaWiki from thinking this is
> @@ -579,7 +579,7 @@ sub mediawiki_clean_filename {
>
>  sub mediawiki_smudge_filename {
>         my $filename = shift;
> -       $filename =~ s/\//@{[SLASH_REPLACEMENT]}/g;
> +       $filename =~ s{/}{$SLASH_REPLACEMENT}g;

Ditto regarding patch 2/22.

>         $filename =~ s/ /_/g;
>         # Decode forbidden characters encoded in mediawiki_clean_filename
>         $filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf("%c", hex($1))/ge;
> @@ -762,7 +762,7 @@ sub get_more_refs {
>         my @refs;
>         while (1) {
>                 my $line = <STDIN>;
> -               if ($line =~ m/^$cmd (.*)$/) {
> +               if ($line =~ /^$cmd (.*)$/) {
>                         push(@refs, $1);
>                 } elsif ($line eq "\n") {
>                         return @refs;
> @@ -1168,11 +1168,11 @@ sub mw_push_revision {
>                 my @local_ancestry = split(/\n/, run_git("rev-list --boundary --parents $local ^$parsed_sha1"));
>                 my %local_ancestry;
>                 foreach my $line (@local_ancestry) {
> -                       if (my ($child, $parents) = $line =~ m/^-?([a-f0-9]+) ([a-f0-9 ]+)/) {
> -                               foreach my $parent (split(' ', $parents)) {
> +                       if (my ($child, $parents) = $line =~ /^-?([a-f0-9]+) ([a-f0-9 ]+)/) {
> +                               foreach my $parent (split(/ /, $parents)) {
>                                         $local_ancestry{$parent} = $child;
>                                 }
> -                       } elsif (!$line =~ m/^([a-f0-9]+)/) {
> +                       } elsif (!$line =~ /^([a-f0-9]+)/) {
>                                 die "Unexpected output from git rev-list: $line";
>                         }
>                 }
> @@ -1190,10 +1190,10 @@ sub mw_push_revision {
>                 # history (linearized with --first-parent)
>                 print STDERR "Warning: no common ancestor, pushing complete history\n";
>                 my $history = run_git("rev-list --first-parent --children $local");
> -               my @history = split('\n', $history);
> +               my @history = split(/\n/, $history);

This looks fishy. In Perl, '\n' is not a newline, but instead a
literal backslash followed by an "n". The output of "rev-list
--first-parent" is line-oriented, so this looks like a bug in the
original code, which you may want to fix it in a separate patch to
make it clear that this is an actual bug fix, distinct from the
otherwise mechanical change of split('x') to split(/x/).

>                 @history = @history[1..$#history];
>                 foreach my $line (reverse @history) {
> -                       my @commit_info_split = split(/ |\n/, $line);
> +                       my @commit_info_split = split(/[ \n]/, $line);
>                         push(@commit_pairs, \@commit_info_split);
>                 }
>         }
> @@ -1272,7 +1272,7 @@ sub get_mw_namespace_id {
>                 # Look at configuration file, if the record for that namespace is
>                 # already cached. Namespaces are stored in form:
>                 # "Name_of_namespace:Id_namespace", ex.: "File:6".
> -               my @temp = split(/[\n]/, run_git("config --get-all remote."
> +               my @temp = split(/\n/, run_git("config --get-all remote."
>                                                 . $remotename .".namespaceCache"));
>                 chomp(@temp);
>                 foreach my $ns (@temp) {
> --
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]