Re: query not using GIN index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tom,
Thanks for you valuable input. You¹re right, the plan was coming from
explain analyze SELECT "access_grants".* FROM "access_grants²  WHERE
(access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) limit 1;
We tried removing "limit 1², which did give us the benefit of using index
for sometime. However, after a while, it went back to the old behavior of
ignoring the index for " SELECT "access_grants".* FROM "access_grants²
WHERE (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;² .
We had to turn off sequential scans (enable_seqscan) to force it use the
index. But I¹m not sure this should be the permanent fix.
The access_grants table has 797415 rows and the schema as below:
         Column          |            Type             |
      Modifiers
-------------------------+-----------------------------+-------------------
-----------------------------------------
 id                      | integer                     | not null default
nextval('access_grants_id_seq'::regclass)
 user_id                 | integer                     | not null
 code                    | text                        | not null
 client_application_name | text                        | not null
 access_tokens           | text[]                      | default
'{}'::text[]
 created_at              | timestamp without time zone | not null
 updated_at              | timestamp without time zone | not null
 mongo_id                | text                        |
Indexes:
    "access_grants_pkey" PRIMARY KEY, btree (id)
    "index_access_grants_on_code" UNIQUE, btree (code)
    "index_access_grants_on_mongo_id" UNIQUE, btree (mongo_id)
    "idx_access_grants_on_access_tokens" gin (access_tokens)
    "index_access_grants_on_user_id" btree (user_id)



The array length distribution of access_token is below:
309997 rows has only one element, 248334 rows has empty array, 432 rows
has array length >100, and 1 row has array length 3575.
The table size is 154MB, and the index size is 180MB.
  
 

 It¹s on AWS db.r3.xlarge instance with 4 virtual cores4, Memory30.5 GiB,
General purpose ssd, with shared_buffers 1048576 and work_mem 159744.






On 8/22/15, 12:36 AM, "pgsql-performance-owner@xxxxxxxxxxxxxx on behalf of
Tomas Vondra" <pgsql-performance-owner@xxxxxxxxxxxxxx on behalf of
tomas.vondra@xxxxxxxxxxxxxxx> wrote:

>Hi,
>
>On 08/22/2015 03:55 AM, Guo, Yun wrote:
>> Hi,
>>
>> We have a query on a column with GIN index, but query plan chooses not
>> using the index but do an seq scan whichi is must slower
>>
>> CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING
>> gin (access_tokens);
>>
>> explain analyze SELECT "access_grants".* FROM "access_grants"  WHERE
>> (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;
>>                                                          QUERY PLAN
>> 
>>-------------------------------------------------------------------------
>>-------------------------------------------------
>>   Limit  (cost=0.00..7.46 rows=1 width=157) (actual
>> time=260.376..260.377 rows=1 loops=1)
>>     ->  Seq Scan on access_grants  (cost=0.00..29718.03 rows=3985
>> width=157) (actual time=260.373..260.373 rows=1 loops=1)
>>           Filter: (access_tokens @>
>> '{124e5a1f9de325fc176a7c89152ac734}'::text[])
>>           Rows Removed by Filter: 796818
>>   Total runtime: 260.408 ms
>>
>
>I find it very likely that the explain output actually comes from a
>slightly different query, including a LIMIT 1 clause.
>
>That might easily be the problem here, because the optimizer expects the
>3985 "matches" to be uniformly distributed in the table, so it thinks
>it'll scan just a tiny fraction of the table (1/3985) until the first
>match. But it's quite possible all at rows are end of the table, and the
>executor has to actually scan the whole table.
>
>It's difficult to say without further details of the table and how the
>data are generated.
>
>> We tested on smaller table in development region and it chooses to use
>> the index there. However, in production size table it decides to ignore
>> the index for unknown reasons.
>
>Please provide explain output from that table. It's difficult to say
>what's different without seeing the details.
>
>Also please provide important details about the system (e.g. which
>PostgreSQL version, how much RAM, what work_mem/shared_buffers and such
>stuff).
>
>> Is the large number of tuples skewing the query planner¹s decision
>> or the index itself is larger than the table therefor it would decide
>> to do table scan?
>
>What large number of tuples? The indexes are supposed to be more
>efficient the larger the table is.
>
>regards
>
>--
>Tomas Vondra                  http://www.2ndQuadrant.com
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux