Search Postgresql Archives

Re: How to use full-text search URL parser to filter query results by domain name?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07.04.2019 07:06, Jess Wren wrote:
However, I can't figure out how I would integrate this into the above query to filter out duplicate domains from the results. And because this is the docs for "testing and debugging text search <https://www.postgresql.org/docs/11/textsearch-debugging.html#TEXTSEARCH-PARSER-TESTING>", I don't know if this use of `ts_parse()` is even related to how the URL parser is intended to be used in practice.

How would I use the "host" parser in my query above to return one row per domain? Also, how would I appropriately index the "links" table for "host" and "url" token lookup?

I think it is normal to use ts_parse(). And I suppose you might use windows functions.

For example, you have table links:

=# create table links (score int, link text);
=# insert into links values
  (1, 'http://www.foo.com/bar'),
  (2, 'http://www.foo.com/foo'),
  (2, 'http://www.bar.com/foo'),
  (1, 'http://www.bar.com/bar');

You can use the following query:

=# with l as (
  select score, token, link,
    rank() over (partition by token order by score) as rank
  from links,
    lateral ts_parse('default', link)
  where tokid = 6)
select score, token, link from l where rank = 1;
 score |    token    |          link
-------+-------------+------------------------
     1 | www.bar.com | http://www.bar.com/bar
     1 | www.foo.com | http://www.foo.com/bar

It is just the idea, probably the query might be simpler.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux