Re: query not using GIN index

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Sat, 22 Aug 2015 06:36:15 +0200

Hi,

On 08/22/2015 03:55 AM, Guo, Yun wrote:
Hi,

We have a query on a column with GIN index, but query plan chooses not
using the index but do an seq scan whichi is must slower

CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING
gin (access_tokens);

explain analyze SELECT "access_grants".* FROM "access_grants"  WHERE
(access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;
                                                         QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..7.46 rows=1 width=157) (actual
time=260.376..260.377 rows=1 loops=1)
    ->  Seq Scan on access_grants  (cost=0.00..29718.03 rows=3985
width=157) (actual time=260.373..260.373 rows=1 loops=1)
          Filter: (access_tokens @>
'{124e5a1f9de325fc176a7c89152ac734}'::text[])
          Rows Removed by Filter: 796818
  Total runtime: 260.408 ms

I find it very likely that the explain output actually comes from a 
slightly different query, including a LIMIT 1 clause.

That might easily be the problem here, because the optimizer expects the 
3985 "matches" to be uniformly distributed in the table, so it thinks 
it'll scan just a tiny fraction of the table (1/3985) until the first 
match. But it's quite possible all at rows are end of the table, and the 
executor has to actually scan the whole table.

It's difficult to say without further details of the table and how the 
data are generated.

We tested on smaller table in development region and it chooses to use
the index there. However, in production size table it decides to ignore
the index for unknown reasons.

Please provide explain output from that table. It's difficult to say 
what's different without seeing the details.

Also please provide important details about the system (e.g. which 
PostgreSQL version, how much RAM, what work_mem/shared_buffers and such 
stuff).

Is the large number of tuples skewing the query planner’s decision
or the index itself is larger than the table therefor it would decide
to do table scan?

What large number of tuples? The indexes are supposed to be more 
efficient the larger the table is.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance