Search Postgresql Archives

Re: Full text search on partial URLs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/06/2013 02:04 PM, bricklen wrote:

On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin
<zev-pgsql@xxxxxxxxxxxxxxxxx <mailto:zev-pgsql@xxxxxxxxxxxxxxxxx>> wrote:

    On 11/06/2013 01:47 PM, bricklen wrote:


        On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
        <zev-pgsql@xxxxxxxxxxxxxxxxx
        <mailto:zev-pgsql@xxxxxxxxxxxxxxxxx>
        <mailto:zev-pgsql@__strangersgate.com
        <mailto:zev-pgsql@xxxxxxxxxxxxxxxxx>>> wrote:

             Hi,

             I have Postgres full text search set up for my application
        and it's
             been working great!  However, my users would like their
        searches to
             turn up parts of URLs.  For example, they would like a
        search for
             "foobar" to turn up a document that contains the string
             "http://example.com/foobar/____blah
        <http://example.com/foobar/__blah>
        <http://example.com/foobar/__blah <http://example.com/foobar/blah>>"
             (and similarly for queries like "example" and "blah).  With the
             default dictionaries for host, url, and url_path, the
        search query
             would have to contain the complete host or url path.

             What is the best way to accomplish this?  Should I be
        looking at
             building a custom dictionary that breaks down hosts and
        urls or is
             there something simpler I can do?


        Have you looked into trigrams?
        http://www.postgresql.org/__docs/current/static/pgtrgm.__html
        <http://www.postgresql.org/docs/current/static/pgtrgm.html>


    I've looked at it in the context of adding fuzzy search.  But my
    understanding is that doing a fuzzy search here would only work if
    the query were a significant fraction of, say, the url path.  For
    example, I would expect a fuzzy search of "foobar" on "/foobar/x" to
    return a high similarity, but a fuzzy search of "foobar" on
    "/foobar/some/very/long/path/__x" to have a low similarity.

    Or are you suggesting using trigrams in a different way?


Yeah, I was thinking more along the lines of allowing wildcard
searching, not similarity.

Eg.
CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST (
yourcol gist_trgm_ops );
select * from yourtable where yourcol ~~ '%foobar%';


Hrm. That might work. So the application-level search functionality would be the union of tsearch and trigram wildcard matching.

If anyone else has other ideas, I'd be interested in hearing them as well.


Thanks,
Zev


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux