Search Postgresql Archives

Re: Counting the number of repeated phrases in a column

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 25, 2022 at 11:10 AM Shaozhong SHI <shishaozhong@xxxxxxxxx> wrote:
>
> There is a short of a function in the standard Postgres to do the following:
>
> It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence of phrases.
>
> For instance:
>
> A cell of value:  'Hello World' means 1 occurrence a phrase.
>
> A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase.
>
> But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'.
>
> 'The City of London, London' also has no occurrences of any repeated phrase.
>
> Anyone has got such a function to check out the number of occurrence of any repeated phrases?

Let's define phase as a sequence of two or more words, delimited by
space.  you could find it with something like:

with s as (select 'Hello World Hello World' as sentence)
select
  phrase,
  array_upper(string_to_array((select sentence from s), phrase), 1) -
1 as occurrances
from
(
  select array_to_string(x, ' ') as phrase
  from
  (
    select distinct v[a:b]  x
    from regexp_split_to_array((select sentence from s), ' ') v
    cross join lateral generate_series(1, array_upper(v, 1)) a
    cross join lateral generate_series(a + 1, array_upper(v, 1)) b
  ) q
) q;

this would be slow for large sentences obviously, and you'd probably
want to prepare the string stripping some characters and such.

merlin





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux