Search Postgresql Archives

Re: [E] Regexp_replace bug / does not terminate on long strings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 20, 2021, at 9:52 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> 
> "a*" is easy.  "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.

The following queries take radically different time to run:

\timing
select regexp_replace(
  repeat('someone,one,one,one,one,one,one,', 60),
  '(?<=^|,)([^,]+)(?:,\1)+(?=$|,)',
  '\1', -- replacement
  'g'  -- apply globally (all matches)
);

Time: 16476.529 ms (00:16.477)

select regexp_replace(
  repeat('someone,one,one,one,one,one,one,', 60),
  '(?<=^|,)([^,]+)(?:,\1){5}(?=$|,)',
  '\1', -- replacement
  'g'  -- apply globally (all matches)
);

Time: 1.452 ms

The only difference in the patterns is the + vs. the {5}.  It looks to me like the first pattern should greedily match five ",one" matches and be forced to stop since ",someone" doesn't match, and the second pattern should grab the five ",one" matches it was told to grab and not try to grab the ",someone", but other than that, they should be performing the same work.  I don't see why the performance should be so different.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company









[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux