Re: query not using GIN index

"Guo, Yun" <YGuo@xxxxxxxxx> · Mon, 24 Aug 2015 15:08:29 +0000

Hi Tom,
Thanks for you valuable input.

>Hi,
>
>On 08/22/2015 03:55 AM, Guo, Yun wrote:
>> Hi,
>>
>> We have a query on a column with GIN index, but query plan chooses not
>> using the index but do an seq scan whichi is must slower
>>
>> CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING
>> gin (access_tokens);
>>
>> explain analyze SELECT "access_grants".* FROM "access_grants"  WHERE
>> (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;
>>                                                          QUERY PLAN
>> 
>>-------------------------------------------------------------------------
>>-------------------------------------------------
>>   Limit  (cost=0.00..7.46 rows=1 width=157) (actual
>> time=260.376..260.377 rows=1 loops=1)
>>     ->  Seq Scan on access_grants  (cost=0.00..29718.03 rows=3985
>> width=157) (actual time=260.373..260.373 rows=1 loops=1)
>>           Filter: (access_tokens @>
>> '{124e5a1f9de325fc176a7c89152ac734}'::text[])
>>           Rows Removed by Filter: 796818
>>   Total runtime: 260.408 ms
>>
>
>I find it very likely that the explain output actually comes from a
>slightly different query, including a LIMIT 1 clause.

 You¹re right, the plan was coming from
explain analyze SELECT "access_grants".* FROM "access_grants² WHERE
(access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) limit 1;
We tried removing "limit 1², which did give us the benefit of using index
for sometime. However, after a while, it went back to the old behavior of
ignoring the index for " SELECT "access_grants".* FROM "access_grants²
WHERE (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;² .
We had to turn off sequential scans (enable_seqscan) to force it use the
index. But I¹m not sure this should be the permanent fix.

>
>That might easily be the problem here, because the optimizer expects the
>3985 "matches" to be uniformly distributed in the table, so it thinks
>it'll scan just a tiny fraction of the table (1/3985) until the first
>match. But it's quite possible all at rows are end of the table, and the
>executor has to actually scan the whole table.
>
>It's difficult to say without further details of the table and how the
>data are generated.

The access_grants table has 797415 rows and the schema as below:
Column | Type |
Modifiers
-------------------------+-----------------------------+-------------------
-----------------------------------------
id | integer | not null default
nextval('access_grants_id_seq'::regclass)
user_id | integer | not null
code | text | not null
client_application_name | text | not null
access_tokens | text[] | default
'{}'::text[]
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
mongo_id | text |
Indexes:
"access_grants_pkey" PRIMARY KEY, btree (id)
"index_access_grants_on_code" UNIQUE, btree (code)
"index_access_grants_on_mongo_id" UNIQUE, btree (mongo_id)
"idx_access_grants_on_access_tokens" gin (access_tokens)
"index_access_grants_on_user_id" btree (user_id)

>
>> We tested on smaller table in development region and it chooses to use
>> the index there. However, in production size table it decides to ignore
>> the index for unknown reasons.
>
>Please provide explain output from that table. It's difficult to say
>what's different without seeing the details.
>
>Also please provide important details about the system (e.g. which
>PostgreSQL version, how much RAM, what work_mem/shared_buffers and such
>stuff).

The array length distribution of access_token is below:
309997 rows has only one element, 248334 rows has empty array, 432 rows
has array length >100, and 1 row has array length 3575.
The table size is 154MB, and the index size is 180MB.

It¹s on AWS db.r3.xlarge instance with 4 virtual cores4, Memory30.5 GiB,
General purpose ssd, with shared_buffers 1048576 and work_mem 159744.

>
>> Is the large number of tuples skewing the query planner’s decision
>> or the index itself is larger than the table therefor it would decide
>> to do table scan?
>
>What large number of tuples? The indexes are supposed to be more
>efficient the larger the table is.
>
>regards
>
>--
>Tomas Vondra                  http://www.2ndQuadrant.com
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance