Search Postgresql Archives

Re: Incorrect FTS result with GIN index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Artur,

I downloaded your dump and tried your queries with index, I see no problem
so far.

                              Table "public.search_tab"
Column | Type | Modifiers ----------------+---------+----------------------------------------------------------
 id             | integer | not null default nextval('search_tab_id_seq1'::regclass)
 keywords       | text    |
 collection_urn | text    |
 bbox           | text    |
 object_urn     | text    | not null
 description    | text    |
 category       | text    |
 summary        | text    |
priority | integer | Indexes:
    "search_tab_pkey1" PRIMARY KEY, btree (id)
    "idx_keywords_ger" gin (to_tsvector('german'::regconfig, keywords))

test=# explain analyze select count(*) from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*')); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=103.87..103.88 rows=1 width=0) (actual time=24.784..24.784 rows=1 loops=1)
   ->  Bitmap Heap Scan on search_tab  (cost=5.21..103.80 rows=25 width=0) (actual time=24.642..24.769 rows=123 loops=1)
         Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery))
         ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.21 rows=25 width=0) (actual time=24.620..24.620 rows=123 loops=1)
               Index Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery))
 Total runtime: 24.830 ms
(6 rows)

see rows=123


On Thu, 15 Jul 2010, Artur Dabrowski wrote:


Hello,

I was trying to use GIN index, but the results seem be incorrect.


1. QUERY WITHOUT INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

count
-------
  123
(1 row)


2. CREATING INDEX
create index idx_keywords_ger on search_tab
using gin(to_tsvector('german', keywords));


3. QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

count
-------
  116
(1 row)


The number of rows is different. To make things more funny and ensure
problem is not caused by dictionary normalisation:

4. EQUIVALENT QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*'));

count
-------
  123
(1 row)

I tried the same with simple-based dictionary. The problem is always
reproducible.

Total count of records in my database is 1 006 300 if it matters.

One of missing results is the following: "lSWN eeIf hInEI IN
SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce". If the query is more
specifically targeted to find this row then it founds it:

5. MORE DETAILED QUERY WITH INDEX
select keywords from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:* & dd:*'));

                                    keywords

--------------------------------------------------------------------------------

lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8  BiIfgGl1 EeIhulSLenS6LDe5O
hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
hGe WIiI EeItnLLuA1efOh3ALWc  uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA  lAhNLLccfWISE l
6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
eeiog
rfiwgso0g364l1 1wU eei1n 5lL dDA 0
DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
uS8snH
DDD EWlE1GShhLe8L NENI  tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
(14 rows)


Did I misunderstood something or is it a bug?

Best regards
Artur


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux