On Thu, Jan 27, 2022 at 11:56 AM <benj.dev@xxxxxxxxxxx> wrote: > Le 27/01/2022 à 18:35, Merlin Moncure a écrit : > > select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences > > from > > ( > > select array_agg(t) v > > from > > ( > > select trim(replace(unnest(v), E'\n', '')) t > > from regexp_split_to_array(<sentence>, ' ') v > > ) q > > where length(t) > 1 > > ) q > > cross join lateral generate_series(1, array_upper(v, 1)) a > > cross join lateral generate_series(a + 1, array_upper(v, 1)) b > > group by 1 > > having count(*) > 1; > > > > We are definitely in N^2 space here, so look for things to start > > breaking down for sentences > 1000 words. > > > > merlin > > > > (for better complexity) you may search about "Ukkonen suffix tree" > Similar problem as yours : > https://www.geeksforgeeks.org/suffix-tree-application-3-longest-repeated-substring/?ref=lbp Yep. Many problems like this are well solved in imperative languages and will fit poorly into SQL quase-functional space. That implementation could probably be converted to pl/pgsql pretty easily, or a 'sql + tables' variant as a fun challenge. It also slightly exploits the fact that only the most repeated needle is returned, rather than all of them. Having the need to have single statement stateless SQL solutions to interesting problems comes up all the time in common development practice though for simplicity's sake even if there are better approaches out there. It's also fun. merlin