Hi Tom, Thanks for you valuable input. >Hi, > >On 08/22/2015 03:55 AM, Guo, Yun wrote: >> Hi, >> >> We have a query on a column with GIN index, but query plan chooses not >> using the index but do an seq scan whichi is must slower >> >> CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING >> gin (access_tokens); >> >> explain analyze SELECT "access_grants".* FROM "access_grants" WHERE >> (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ; >> QUERY PLAN >> >>------------------------------------------------------------------------- >>------------------------------------------------- >> Limit (cost=0.00..7.46 rows=1 width=157) (actual >> time=260.376..260.377 rows=1 loops=1) >> -> Seq Scan on access_grants (cost=0.00..29718.03 rows=3985 >> width=157) (actual time=260.373..260.373 rows=1 loops=1) >> Filter: (access_tokens @> >> '{124e5a1f9de325fc176a7c89152ac734}'::text[]) >> Rows Removed by Filter: 796818 >> Total runtime: 260.408 ms >> > >I find it very likely that the explain output actually comes from a >slightly different query, including a LIMIT 1 clause. You¹re right, the plan was coming from explain analyze SELECT "access_grants".* FROM "access_grants² WHERE (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) limit 1; We tried removing "limit 1², which did give us the benefit of using index for sometime. However, after a while, it went back to the old behavior of ignoring the index for " SELECT "access_grants".* FROM "access_grants² WHERE (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;² . We had to turn off sequential scans (enable_seqscan) to force it use the index. But I¹m not sure this should be the permanent fix. > >That might easily be the problem here, because the optimizer expects the >3985 "matches" to be uniformly distributed in the table, so it thinks >it'll scan just a tiny fraction of the table (1/3985) until the first >match. But it's quite possible all at rows are end of the table, and the >executor has to actually scan the whole table. > >It's difficult to say without further details of the table and how the >data are generated. The access_grants table has 797415 rows and the schema as below: Column | Type | Modifiers -------------------------+-----------------------------+------------------- ----------------------------------------- id | integer | not null default nextval('access_grants_id_seq'::regclass) user_id | integer | not null code | text | not null client_application_name | text | not null access_tokens | text[] | default '{}'::text[] created_at | timestamp without time zone | not null updated_at | timestamp without time zone | not null mongo_id | text | Indexes: "access_grants_pkey" PRIMARY KEY, btree (id) "index_access_grants_on_code" UNIQUE, btree (code) "index_access_grants_on_mongo_id" UNIQUE, btree (mongo_id) "idx_access_grants_on_access_tokens" gin (access_tokens) "index_access_grants_on_user_id" btree (user_id) > >> We tested on smaller table in development region and it chooses to use >> the index there. However, in production size table it decides to ignore >> the index for unknown reasons. > >Please provide explain output from that table. It's difficult to say >what's different without seeing the details. > >Also please provide important details about the system (e.g. which >PostgreSQL version, how much RAM, what work_mem/shared_buffers and such >stuff). The array length distribution of access_token is below: 309997 rows has only one element, 248334 rows has empty array, 432 rows has array length >100, and 1 row has array length 3575. The table size is 154MB, and the index size is 180MB. It¹s on AWS db.r3.xlarge instance with 4 virtual cores4, Memory30.5 GiB, General purpose ssd, with shared_buffers 1048576 and work_mem 159744. > >> Is the large number of tuples skewing the query planner’s decision >> or the index itself is larger than the table therefor it would decide >> to do table scan? > >What large number of tuples? The indexes are supposed to be more >efficient the larger the table is. > >regards > >-- >Tomas Vondra http://www.2ndQuadrant.com >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > >-- >Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance