Re: [PATCH v3 03/22] build-aux: rewrite po file minimizer in Python

Erik Skultety <eskultet@xxxxxxxxxx> · Fri, 27 Sep 2019 09:22:13 +0200

On Thu, Sep 26, 2019 at 04:38:49PM +0100, Daniel P. Berrangé wrote:
> On Thu, Sep 26, 2019 at 05:34:49PM +0200, Ján Tomko wrote:
> > On Thu, Sep 26, 2019 at 02:16:04PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Sep 26, 2019 at 12:39:39PM +0200, Erik Skultety wrote:
> > > > On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
> > > > question 1) what's the benefit of compiling a regex and using it only once? Btw
> > > > python does cache every pattern passed to re.match (and friends) so compilation
> > > > IMO hardly ever makes sense unless you're doing 1000s of searches for the same
> >
> > Some of the scripts here are run on the whole libvirt codebase so that
> > is the case here. For example just removing the pre-compilation of
> > regexes for comments from the spacing check script bumped the execution
> > time from 6.5s to 7.4s
> >
> > Sadly, the one script where pre-compilation matters the most is the one
> > where separating them puts them far away from the usage to not fit on
> > one screen.
>
> I could do a little custom function that caches all regexes
>
>   recache = {}
>
>   def research(regex, line):
>     global recache
>     if regex not in recache:
>       recache[regex] = re.compile(regex)
>     return recache[regex].search(line)

I'm not sure how ^this would solve the slowdown Jano is seeing as this is
exactly what python should already be doing internally, IOW the slowdown Jano
reported is most likely caused by cache accesses which I don't think our own
custom cache would solve, so we probably do want to keep the compilation in even
though I personally don't mind the ~1 sec penalty here (compared to the 4x
slowdown in the next patch which I think we need to do better to resolve).

Erik

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list