Re: Regular Expression For Duplicate Words

Shaozhong SHI <shishaozhong@xxxxxxxxx> · Thu, 3 Feb 2022 21:09:03 +0000

Hi, Peter,  Interesting.

On Thu, 3 Feb 2022 at 19:48, Peter J. Holzer <hjp-pgsql@xxxxxx> wrote:
On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:

> regex - Regular _expression_ For Duplicate Words - Stack Overflow

> 

> Is there any example in Postgres?

It's pretty much the same as with other regexp dialects: User word

boundaries and a word character class to match any word and then use a

backreference to match a duplicate word. All the building blocks are

described on

https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP

and except for [[:<:]] and [[:>:]] for the word boundaries, they are

also pretty standard.

So

[[:<:]]        start of word

([[:alpha:]]+) one or more alphabetic characters in a capturing group

[[:>:]]        end of word

\W+            one or more non-word characters

[[:<:]]        start of word

\1             the content of the first (and only) capturing group

[[:>:]]        end of word

All together:

select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';

Give a good example if you can.

Regards,

David