On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:
I am running the last version of PostgreSQL 9.3.3
I have two tables detm and corm and a lot of datas in the column
cormdata of corm table (1.4 GB).
I have a GIN index on cormdata:
CREATE INDEX ix_corm_fulltext_cormdata ON corm
USING gin (to_tsvector('french'::regconfig, cormdata))
WHERE cormishtml IS FALSE AND length(cormdata) < 20000;
select distinct b.detmmailid from corm b where
(to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
b.cormishtml is false and length(b.cormdata) < 20000)
is very fast and use the GIN index.
"HashAggregate (cost=2027.72..2031.00 rows=328 width=52)"
" -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548 width=52)"
" Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
(length(cormdata) < 20000))"
" -> Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)"
" Index Cond: (to_tsvector('french'::regconfig, cormdata)
@@ to_tsquery('mauritanie'::text))"
With a join an another table detm, GIN index is not used
explain select distinct a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata) < 20000) OR ( detmobjet ~* 'mauritanie' ))
"HashAggregate (cost=172418.27..172423.98 rows=571 width=52)"
" -> Hash Join (cost=28514.92..172416.85 rows=571 width=52)"
" Hash Cond: (b.detmmailid = a.detmmailid)"
" Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
(length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
" -> Seq Scan on corm b (cost=0.00..44755.07 rows=449507
width=689)"
" -> Hash (cost=19322.74..19322.74 rows=338574 width=94)"
" -> Seq Scan on detm a (cost=0.00..19322.74 rows=338574
width=94)"
If I remove OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
index is used
explain select distinct a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata) < 20000))
"HashAggregate (cost=4295.69..4301.17 rows=548 width=52)"
" -> Nested Loop (cost=24.67..4294.32 rows=548 width=52)"
" -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548
width=52)"
" Recheck Cond: ((to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
AND (length(cormdata) < 20000))"
" -> Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)"
" Index Cond: (to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text))"
" -> Index Only Scan using pkey_detm on detm a (cost=0.42..4.13
rows=1 width=52)"
" Index Cond: (detmmailid = b.detmmailid)"
How can i force the use of the GIN index ?
thanks for your tips,
The problem with the OR detmobject ~* 'mauritanie' restriction is that
the rows that match that condition cannot be found using the GIN index.
I think you'd want the system to fetch all the rows that match the other
condition using the GIN index, and do something else to find the other
rows. The planner should be able to do that if you rewrite the query as
a UNION:
select a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid
where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
b.cormishtml is false and length(b.cormdata) < 20000)
union
select a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid
where detmobjet ~* 'mauritanie'
Note that that will not return rows in 'detm' that have no matching rows
in 'corm' table, even if they match the "detmobjet ~* 'mauritanie"
condition. That's what your original query also did, but if that's not
what you want, leave out the JOIN from the second part of the union.
- Heikki
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance