Re: [PATCH v3 03/22] build-aux: rewrite po file minimizer in Python

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
> As part of an goal to eliminate Perl from libvirt build tools,
> rewrite the minimize-po.pl tool in Python.
>
> This was a straight conversion, manually going line-by-line to
> change the syntax from Perl to Python. Thus the overall structure
> of the file and approach is the same.
>
> Signed-off-by: Daniel P. Berrangé <berrange@xxxxxxxxxx>
> ---
>  Makefile.am              |  2 +-
>  build-aux/minimize-po.pl | 37 -------------------------
>  build-aux/minimize-po.py | 60 ++++++++++++++++++++++++++++++++++++++++
>  po/Makefile.am           |  2 +-
>  4 files changed, 62 insertions(+), 39 deletions(-)
>  delete mode 100755 build-aux/minimize-po.pl
>  create mode 100755 build-aux/minimize-po.py
>
> diff --git a/Makefile.am b/Makefile.am
> index 17448a914e..8f688d40d0 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -45,7 +45,7 @@ EXTRA_DIST = \
>    build-aux/check-spacing.pl \
>    build-aux/gitlog-to-changelog \
>    build-aux/header-ifdef.pl \
> -  build-aux/minimize-po.pl \
> +  build-aux/minimize-po.py \
>    build-aux/mock-noinline.pl \
>    build-aux/prohibit-duplicate-header.pl \
>    build-aux/useless-if-before-free \
> diff --git a/build-aux/minimize-po.pl b/build-aux/minimize-po.pl
> deleted file mode 100755
> index 497533a836..0000000000
> --- a/build-aux/minimize-po.pl
> +++ /dev/null
> @@ -1,37 +0,0 @@
> -#!/usr/bin/perl
> -
> -my @block;
> -my $msgstr = 0;
> -my $empty = 0;
> -my $unused = 0;
> -my $fuzzy = 0;
> -while (<>) {
> -    if (/^$/) {
> -        if (!$empty && !$unused && !$fuzzy) {
> -            print @block;
> -        }
> -        @block = ();
> -        $msgstr = 0;
> -        $fuzzy = 0;
> -        push @block, $_;
> -    } else {
> -        if (/^msgstr/) {
> -            $msgstr = 1;
> -            $empty = 1;
> -        }
> -        if (/^#.*fuzzy/) {
> -            $fuzzy = 1;
> -        }
> -        if (/^#~ msgstr/) {
> -            $unused = 1;
> -        }
> -        if ($msgstr && /".+"/) {
> -            $empty = 0;
> -        }
> -        push @block, $_;
> -    }
> -}
> -
> -if (@block && !$empty && !$unused) {

I guess the fact !$fuzzy was missing in this condition was a bug that the new
python code doesn't suffer from, right?

> -    print @block;
> -}
> diff --git a/build-aux/minimize-po.py b/build-aux/minimize-po.py
> new file mode 100755
> index 0000000000..5046bacede
> --- /dev/null
> +++ b/build-aux/minimize-po.py
> @@ -0,0 +1,60 @@
> +#!/usr/bin/env python
> +#
> +# Copyright (C) 2018-2019 Red Hat, Inc.
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +from __future__ import print_function
> +
> +import re
> +import sys
> +
> +block = []
> +msgstr = False
> +empty = False
> +unused = False
> +fuzzy = False
> +
> +strprog = re.compile(r'''.*".+".*''')

question 1) what's the benefit of compiling a regex and using it only once? Btw
python does cache every pattern passed to re.match (and friends) so compilation
IMO hardly ever makes sense unless you're doing 1000s of searches for the same
pattern in which case the latency would naturally accumulate.

question 2) why do we need the '''.* and .*''' parts compared to the original
perl regex? I'll go ahead and assume it's because re.match matches at the
beginning of a string by default, in which case, shouldn't we use re.search
which matches anywhere (that's what perl does by default) instead?

Erik

> +
> +for line in sys.stdin:
> +    line = line.rstrip("\n")
> +
> +    if line == "":
> +        if not empty and not unused and not fuzzy:
> +            for b in block:
> +                print(b)
> +
> +        block = []
> +        msgstr = False
> +        fuzzy = False
> +        block.append(line)
> +    else:
> +        if line.startswith("msgstr"):
> +            msgstr = True
> +            empty = True
> +
> +        if line[0] == '#' and line.find("fuzzy") != -1:
> +            fuzzy = True
> +        if line.startswith("#~ msgstr"):
> +            unused = True
> +        if msgstr and strprog.match(line):
> +            empty = False
> +
> +        block.append(line)
> +
> +if not empty and not unused and not fuzzy:
> +    for b in block:
> +        print(b)
> diff --git a/po/Makefile.am b/po/Makefile.am
> index da117edbd5..90025b15dd 100644
> --- a/po/Makefile.am
> +++ b/po/Makefile.am
> @@ -58,7 +58,7 @@ update-mini-po: $(POTFILE)
>  	  $(MSGMERGE) --no-location --no-fuzzy-matching --sort-output \
>  	    $$lang.po $(POTFILE) | \
>  	  $(SED) $(SED_PO_FIXUP_ARGS) | \
> -	  $(PERL) $(top_srcdir)/build-aux/minimize-po.pl > \
> +	  $(RUNUTF8) $(PYTHON) $(top_srcdir)/build-aux/minimize-po.py > \
>  	    $(srcdir)/$$lang.mini.po ; \
>  	done
>
> --
> 2.21.0
>
> --
> libvir-list mailing list
> libvir-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux