Hi, On 08/22/2015 03:55 AM, Guo, Yun wrote:
Hi, We have a query on a column with GIN index, but query plan chooses not using the index but do an seq scan whichi is must slower CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING gin (access_tokens); explain analyze SELECT "access_grants".* FROM "access_grants" WHERE (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..7.46 rows=1 width=157) (actual time=260.376..260.377 rows=1 loops=1) -> Seq Scan on access_grants (cost=0.00..29718.03 rows=3985 width=157) (actual time=260.373..260.373 rows=1 loops=1) Filter: (access_tokens @> '{124e5a1f9de325fc176a7c89152ac734}'::text[]) Rows Removed by Filter: 796818 Total runtime: 260.408 ms
I find it very likely that the explain output actually comes from a slightly different query, including a LIMIT 1 clause.
That might easily be the problem here, because the optimizer expects the 3985 "matches" to be uniformly distributed in the table, so it thinks it'll scan just a tiny fraction of the table (1/3985) until the first match. But it's quite possible all at rows are end of the table, and the executor has to actually scan the whole table.
It's difficult to say without further details of the table and how the data are generated.
We tested on smaller table in development region and it chooses to use the index there. However, in production size table it decides to ignore the index for unknown reasons.
Please provide explain output from that table. It's difficult to say what's different without seeing the details.
Also please provide important details about the system (e.g. which PostgreSQL version, how much RAM, what work_mem/shared_buffers and such stuff).
Is the large number of tuples skewing the query planner’s decision or the index itself is larger than the table therefor it would decide to do table scan?
What large number of tuples? The indexes are supposed to be more efficient the larger the table is.
regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance