Search Postgresql Archives

Re: Full Text Search 101?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jonathan;

>
> I posted to this forum once before and was able to receive help.
> Thanks again!
>
> I'm trying to implement full text search capabilities.  Basically, I
> have a very simple "data catalog" type of website (http://
> gis.drcog.org/datacatalog), where the user can type in a word or words
> to search for records that match the criteria.  I have a table with a
> few fields, but I want to be able to search/index two fields -- the
> "name" of the dataset and the "description."  (or more, if I can index
> fields from other tables too, that would be great).  I'd like to be
> able to use a full text search to rank the results in terms of
> relevance.

You can set up an index per table, but I've found that having a single vector table with bridge tables to the data I want to search works well.
This would be in the style of:

create table vectors (
 id serial primary key,
 vector tsvector not null
);

create table interesting_data (
 id serial primary key,
 textual text not null
);

CREATE TABLE vector_to_interesting (
 v_id int not null references vectors(id),
 i_id int not null references interesting_data(id)
);

And then

SELECT id.*
  FROM interesting_data itd,
       vectors v,
       vector_to_interesting itv
 WHERE v.id = itv.v_id
   AND itv.i_id = itd.id
   AND v.vector @@ to_tsquery('searchpattern');

Allowing for some procedures around that that return multiple row types, to the client software.

>
> What is the best way to handle this?  I've read through the PostgreSQL
> documentation and don't quite understand it although I'm trying to
> understand and am 'playing around' with this on a development server.
> Is the default text search configuration enough? I've tried setting my
> own configuration but get errors about not finding .dict dictionary
> files?  I have a default install of PostgreSQL 8.4.
>
> Also, I've created a tsvector column and created indexes, but it
> didn't seem to delete stop words from the indexes.  Should the stop
> words be indexed?

The default "english" configuration in PostgreSQL should have done this. Building your own configuration tends towards being a bit more advanced, and "english" should suit most needs.

>
> Also, I don't quite understand how to create indexes but rank certain
> words as being more important than others, for instance, maybe having
> words that come from the "name" column carrying more importance than
> words coming from the "description" column.

For part of this, there's ts_rank(tsvector, tsquery), as well as the relative weighting system in the textsearch modules. Unfortunately, I don't have much experience with the relative weighting, but, ts_rank() should get you partly there.

I also found an interesting page on relative weighting @
http://www.postgresonline.com/special_feature.php?sf_name=postgresql83tsearch_cheatsheet&outputformat=html

that might be useful for you.

> Finally, I'd like
> "watersheds" to come up when someone searches for "water" so I don't
> think I have this configured properly, because this record is not
> returned.

Well, there's two ways to achieve this.
The first is to manually the stopwords, which is getting into the custom configuration and a bit on the more advanced side.
The other is that in PG8.4, you can do

to_tsquery('water:*')

which will tell the search parser to do a partial match, which would return "watershed", in this instance. My testing has shown it to be a bit slower (30ms, vs 15ms for a non-partial search), but not egregiously slow.

>
> Is there a good tutorial or maybe documentation that is a bit easier
> to understand?  I know my database is far from complicated but I can't
> seem to find a configuration that works well.  When I try ranking my
> results, most of the results end up with the same rank, so something
> must be wrong?
>
> Can someone point me in the right direction?
>
> Thanks for the help.  I appreciate it.
>

Hope this helps. :)

Regards,
--
Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

ashaw@xxxxxxxxxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux