Search Postgresql Archives

Re: tsvector not giving expected results on one host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dan,

it's always good to specify configuration name in a query to avoid
recheck,  since
websearch_to_tsquery(regconfig, text) is immutable, while
websearch_to_tsquery(text) is stable.

See the difference:

[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('simple','galaxies') @@ fts;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)
   Recheck Cond: ('''galaxies'''::tsquery @@ fts)
   ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
rows=0 loops=1)
         Index Cond: (fts @@ '''galaxies'''::tsquery)
 Planning Time: 0.134 ms
 Execution Time: 0.022 ms
(6 rows)

Time: 0.369 ms
[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('galaxies') @@ fts;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)
   Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)
   Heap Blocks: exact=276
   ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
rows=493 loops=1)
         Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text))
 Planning Time: 0.125 ms
 Execution Time: 1.518 ms
(7 rows)

On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan@xxxxxxxxxxxx> wrote:
>
> On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> >>> "Dan Langille" <dan@xxxxxxxxxxxx> writes:
> >>>>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as (to_tsvector('english'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored
> >>>
> >>> That is not likely to play well with this:
> >>>
> >>>> freshports.org=> show  default_text_search_config ;
> >>>>  default_text_search_config
> >>>> ----------------------------
> >>>>  pg_catalog.simple
> >>>
> >>> because "english" and "simple" will stem words differently.
> >>>
> >>> regression=# select websearch_to_tsquery('english', 'example');
> >>>  websearch_to_tsquery
> >>> ----------------------
> >>>  'exampl'
> >>> (1 row)
> >>>
> >>> regression=# select websearch_to_tsquery('simple', 'example');
> >>>  websearch_to_tsquery
> >>> ----------------------
> >>>  'example'
> >>> (1 row)
> >>>
> >>> If what is in your tsvector is 'exampl', then only the first of
> >>> these will match.  So IMO the question is not "why is it failing
> >>> on prod?", it's "how the heck did it work on the other machine?".
> >>> You won't get nice results if websearch_to_tsquery is using a
> >>> different TS configuration than to_tsvector did.
> >>
> >> I think this shows why we are getting the results we see.  Credit to ch
> >> on IRC for asking this question.
> >>
> >> The problem host:
> >>
> >> freshports.org=> select websearch_to_tsquery('example');
> >>  websearch_to_tsquery
> >> ----------------------
> >>  'example'
> >> (1 row)
> >
> > Ahh, this explains the differences and as to why it works where it shouldn't?
> >
> > freshports.org=> select setting, source from pg_settings where name =
> > 'default_text_search_config';
> >       setting      | source
> > -------------------+---------
> >  pg_catalog.simple | default
> > (1 row)
> >
> >
> >>
> >> The hosts on which this search works
> >>
> >> freshports.devgit=# select websearch_to_tsquery('example');
> >>  websearch_to_tsquery
> >> ----------------------
> >>  'exampl'
> >> (1 row)
> >
> >
> > freshports.devgit=# select setting, source from pg_settings where name
> > = 'default_text_search_config';
> >       setting       |       source
> > --------------------+--------------------
> >  pg_catalog.english | configuration file
> > (1 row)
> >
> >
> > At least now I know what I can play with to get all hosts in sync.
>
> Here we go, on the problem database, create a new field, based on simple, not english.
>
> ALTER TABLE public.ports
>     ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored;
>
> Index it:
>
> CREATE INDEX ports_pkgmessage_textsearchable3_idx
>     ON public.ports USING gin
>     (pkgmessage_textsearchable3)
>     TABLESPACE pg_default;
> CREATE INDEX
>
> query it:
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable3  @@ websearch_to_tsquery('example');
>  port_id |                           element_pathname
> ---------+----------------------------------------------------------------------
>    34126 | /ports/head/security/pond
>    74559 | /ports/branches/2015Q3/emulators/linux_base-c6
>    60310 | /ports/branches/2020Q4/www/gitlab-ce
>    38345 | /ports/head/www/gitlab
>    46842 | /ports/branches/2018Q1/mail/postfix-sasl
>    51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod
>
> Success. Thank you Mr Lane.
>
> --
>   Dan Langille
>   dan@xxxxxxxxxxxx
>
>


-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux