Re: Counting the number of repeated phrases in a column

Ivan Panchenko <i.panchenko@xxxxxxxxxxxxxx> · Wed, 26 Jan 2022 00:33:03 +0300

On 26.01.2022 00:21, benj.dev@xxxxxxxxxxx wrote:
Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
There is a short of a function in the standard Postgres to do the 
following:

It is easy to count the number of occurrence of words, but it is 
rather difficult to count the number of occurrence of phrases.

For instance:

A cell of value:  'Hello World' means 1 occurrence a phrase.

A cell of value: 'Hello World World Hello' means no occurrence of any 
repeated phrase.

But, A cell of value: 'Hello World World Hello Hello World' means 2 
occurrences of 'Hello World'.

'The City of London, London' also has no occurrences of any repeated 
phrase.

Anyone has got such a function to check out the number of occurrence 
of any repeated phrases?

Regards,

David

Don't know if it's exactly what you want, but you can replace all 
occurence of the phrase in the text by empty string and compute the 
diff between the initial and the result and next divide by the length 
of your phrase.

Example :
WITH x AS (SELECT 'toto like tata and toto like titi and toto like 
tutu' , 'toto like' phrase)
SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) 
/ char_length(phrase) AS nb_occurence
FROM x

This works if the user knows the phrase. As far as I understood, the 
phrase is not known, and user wants to count number of repeats of any 
phrases.
Of course this can be done with recursive CTE. Split into words, 
generate all phrases (AFAIK requires recursion), then group and count.

But probably in PL/Perl this could be done  more effectively.