On Mon, Aug 24, 2015 at 8:18 AM, Guo, Yun <YGuo@xxxxxxxxx> wrote:
From: Jeff Janes <jeff.janes@xxxxxxxxx>
Date: Friday, August 21, 2015 at 10:44 PM
To: Yun <yguo@xxxxxxxxx>
Subject: Re: query not using GIN index
On Fri, Aug 21, 2015 at 6:55 PM, Guo, Yun <YGuo@xxxxxxxxx> wrote:
Hi,
We have a query on a column with GIN index, but query plan chooses not using the index but do an seq scan whichi is must slower
CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING gin (access_tokens);
explain analyze SELECT "access_grants".* FROM "access_grants" WHERE (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;QUERY PLAN--------------------------------------------------------------------------------------------------------------------------Limit (cost=0.00..7.46 rows=1 width=157) (actual time=260.376..260.377 rows=1 loops=1)-> Seq Scan on access_grants (cost=0.00..29718.03 rows=3985 width=157) (actual time=260.373..260.373 rows=1 loops=1)Filter: (access_tokens @> '{124e5a1f9de325fc176a7c89152ac734}'::text[])Rows Removed by Filter: 796818Total runtime: 260.408 ms
What version are you running? What are your non-default configuration settings (particularly for the *_cost parameters)?
select name,setting from pg_settings where name like '%cost';name | setting----------------------+---------cpu_index_tuple_cost | 0.005cpu_operator_cost | 0.0025cpu_tuple_cost | 0.01random_page_cost | 4seq_page_cost | 1
OK, thanks. I had overlooked the "LIMIT" in the first plan you posted, and so thought you must have some pretty weird settings. But noticing the LIMIT, it makes more sense with the normal settings, like the ones you show.
Can you turn track_io_timing on and then report a explain (analyze, buffers) of the same query?
I didn’t try this as our prod instance is on AWS and setting this would require a reboot.
OK, but you can still do an "explain (analyze,buffers)". It is less useful than with track_io_timing on, but it is still more useful than just "explain analyze".
Then do a "set enable_seqscan=off" and repeat.
This is the life saver! After applying this, it’s able to use the index. But should we consider it as the permanent solution?
No, probably not a permanent solution. Or at least, I only do things like that in production as a last resort. I suggested doing that so you can force it to use the index and so see what the explain (analyze,buffers) look like when it does use the index. Sorry for not being more clear.
The seq scan thinks it is going to find a matching row pretty early in the scan and can stop at the first one, but based on "Rows Removed by Filter: 796818" it isn't actually finding a match until the end. There probably isn't much you can do about this, other than not using a LIMIT.
The reason it thinks it will find a row soon is that it thinks 0.5% of the rows meet your criteria. That is default selectivity estimate it uses when it has nothing better to use. Raising the statistics target on the column might help. But I doubt it, because access tokens are probably nearly unique, and so even the highest possible setting for statistics target is not going get it to record MCE statistics. See https://commitfest.postgresql.org/6/323/ for a possible solution, but any fix for that won't be released to production for a long time.
If your gin index has a large pending list, that will make the index scan look very expensive. vacuuming the table will clear that up. Setting fastupdate off for the index will prevent it growing again. Based on your description of most lists having 0 or 1 element in them, and my assumption that a table named "access_grants" isn't getting updated hundreds of times a second, I don't think fast_update being off is going to cause any problems at all.
Cheers,
Jeff