Re: not using my GIN index in JOIN expression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:
I am running the last version of PostgreSQL 9.3.3
I have two tables detm and corm and a lot of datas in the column
cormdata of corm table (1.4 GB).

I have a GIN index on cormdata:
CREATE INDEX ix_corm_fulltext_cormdata  ON corm
    USING gin (to_tsvector('french'::regconfig, cormdata))
    WHERE cormishtml IS FALSE AND length(cormdata) < 20000;

select distinct b.detmmailid from corm b where
(to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
b.cormishtml is false and length(b.cormdata) < 20000)
is very fast and use the GIN index.

"HashAggregate  (cost=2027.72..2031.00 rows=328 width=52)"
"  ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548 width=52)"
"        Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
(length(cormdata) < 20000))"
"        ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)"
"              Index Cond: (to_tsvector('french'::regconfig, cormdata)
@@ to_tsquery('mauritanie'::text))"


With a join an another table detm, GIN index is not used


   explain select distinct a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata) < 20000)  OR ( detmobjet ~* 'mauritanie' ))

"HashAggregate  (cost=172418.27..172423.98 rows=571 width=52)"
"  ->  Hash Join  (cost=28514.92..172416.85 rows=571 width=52)"
"        Hash Cond: (b.detmmailid = a.detmmailid)"
"        Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
(length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
"        ->  Seq Scan on corm b  (cost=0.00..44755.07 rows=449507
width=689)"
"        ->  Hash  (cost=19322.74..19322.74 rows=338574 width=94)"
"              ->  Seq Scan on detm a  (cost=0.00..19322.74 rows=338574
width=94)"


If I remove   OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
index is used
   explain select distinct a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata) < 20000))

"HashAggregate  (cost=4295.69..4301.17 rows=548 width=52)"
"  ->  Nested Loop  (cost=24.67..4294.32 rows=548 width=52)"
"        ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
width=52)"
"              Recheck Cond: ((to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
AND (length(cormdata) < 20000))"
"              ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)"
"                    Index Cond: (to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text))"
"        ->  Index Only Scan using pkey_detm on detm a  (cost=0.42..4.13
rows=1 width=52)"
"              Index Cond: (detmmailid = b.detmmailid)"

How can i force the use of the GIN index ?
thanks for your tips,

The problem with the OR detmobject ~* 'mauritanie' restriction is that the rows that match that condition cannot be found using the GIN index. I think you'd want the system to fetch all the rows that match the other condition using the GIN index, and do something else to find the other rows. The planner should be able to do that if you rewrite the query as a UNION:

select a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid
where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and b.cormishtml is false and length(b.cormdata) < 20000)
union
select a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid
where detmobjet ~* 'mauritanie'

Note that that will not return rows in 'detm' that have no matching rows in 'corm' table, even if they match the "detmobjet ~* 'mauritanie" condition. That's what your original query also did, but if that's not what you want, leave out the JOIN from the second part of the union.

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux