Re: [E] Regexp_replace bug / does not terminate on long strings

Mark Dilger <mark.dilger@xxxxxxxxxxxxxxxx> · Fri, 20 Aug 2021 13:26:37 -0700

> On Aug 20, 2021, at 12:51 PM, Miles Elam <miles.elam@xxxxxxxxxxxxxx> wrote:
> 
> Unbounded ranges seem like a problem.

Seems so.  The problem appears to be in regcomp.c's repeat() function which handles {1,SOME} differently than {1,INF}

> Seems worth trying a range from 1 to N where you play around with N to find your optimum performance/functionality tradeoff. {1,20} is like '+' but clamps at 20.

For any such value (5, 20, whatever) there can always be a string with more repeated words than the number you've chosen, and the call to regexp_replace won't do what you want.  There is also an upper bound at work, because values above 255 will draw a regex compilation error.  So it seems worth a bit of work to determine why the regex engine has bad performance in these cases.

It sounds like the OP will be working around this problem by refactoring to call regexp_replace multiple times until all repeats are eradicated, but I don't think such workarounds should be necessary.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company