"Markhof, Ingolf" <ingolf.markhof@xxxxxxxxxxxxxx> writes: > BRIEF: > regexp_replace(source,pattern,replacement,flags) needs very (!) long to > complete or does not complete at all (?!) for big input strings (a few k > characters). (Oracle SQL completes the same in a few ms) Regexps containing backrefs are inherently hard --- every engine has strengths and weaknesses. I doubt it'd be hard to find cases where our engine is orders of magnitude faster than Oracle's; but you've hit on a case where the opposite is true. The core of the problem is that it's hard to tell how much of the string could be matched by the (,\1)* subpattern. In principle, *all* of the remaining string could be, if it were N repetitions of the initial word. Or it could be N-1 repetitions followed by one other word, and so on. The difficulty is that since our engine guarantees to find the longest feasible match, it tries these options from longest to shortest. Usually the actual match (if any) will be pretty short, so that you have O(N) wasted work per word, making the runtime at least O(N^2). I think your best bet is to not try to eliminate multiple duplicates at a time. Get rid of one dup at a time, say by str := regexp_replace(str, '([^,]+)(,\1)?($|,)', '\1\3', 'g'); and repeat till the string doesn't get any shorter. I did come across a performance bug [1] while poking at this, but alas fixing it doesn't move the needle very much for this example. regards, tom lane [1] https://www.postgresql.org/message-id/1808998.1629412269%40sss.pgh.pa.us