Search Postgresql Archives

Re: Shrinking TSvectors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 05/04/2016 15:15, Artur Zakirov wrote:
On 05.04.2016 14:37, Howard News wrote:
Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain
many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
'-9972':945 '/partners/application.html':222
'/partners/program/program-agreement.pdf':271
'/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
'1':753,771 '12':366 '14':66 (...)"

I am not interested in keeping the numbers or urls in the indexes.

Thanks,

Howard.



Hello,

You need create a new text search configuration. Here is an example of commands:

CREATE TEXT SEARCH CONFIGURATION public.english_cfg (
    PARSER = default
);
ALTER TEXT SEARCH CONFIGURATION public.english_cfg
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
        word, hword, hword_part
    WITH pg_catalog.english_stem;

Instead of the "pg_catalog.english_stem" you can use your own dictionary.

Lets compare new configuration with the embedded configuration "pg_catalog.english":

postgres=# select to_tsvector('english_cfg', 'home -9972 /partners/application.html /partners/program/program-agreement.pdf');
 to_tsvector
-------------
 'home':1
(1 row)

postgres=# select to_tsvector('english', 'home -9972 /partners/application.html /partners/program/program-agreement.pdf');
                                          to_tsvector
----------------------------------------------------------------------------------------------- '-9972':2 '/partners/application.html':3 '/partners/program/program-agreement.pdf':4 'home':1
(1 row)


You can get some additional information about configurations using \dF+:

postgres=# \dF+ english
Text search configuration "pg_catalog.english"
Parser: "pg_catalog.default"
      Token      | Dictionaries
-----------------+--------------
 asciihword      | english_stem
 asciiword       | english_stem
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | english_stem
 hword_asciipart | english_stem
 hword_numpart   | simple
 hword_part      | english_stem
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | english_stem

postgres=# \dF+ english_cfg
Text search configuration "public.english_cfg"
Parser: "pg_catalog.default"
      Token      | Dictionaries
-----------------+--------------
 asciihword      | english_stem
 asciiword       | english_stem
 hword           | english_stem
 hword_asciipart | english_stem
 hword_part      | english_stem
 word            | english_stem

Thanks Artur,

Thats amazing! Postgres never ceases to amaze me. And the same goes for the contributors to this list.





--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux