index problems (again)

Geoff Winkless <pgsqladmin@xxxxxxxx> · Mon, 7 Mar 2016 11:38:34 +0000

Hi all

Firstly, I appreciate that my index problems are fairly difficult to
debug given that I can't upload the data anywhere (it's commercially
sensitive); I tried creating an equivalent dataset for my last problem
using a lot of random() inserts, but unfortunately, even though the
sizes and index cardinality seemed similar, it didn't exhibit the same
problem, which leaves me a bit stuck.

I now have (what seems to me to be) an utterly bizarre situation where
postgres is using the "wrong" index, to the extent where I can't even
begin to comprehend why it would do so.

http://explain.depesz.com/s/uF4L

# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=25.54..25.55 rows=1 width=0) (actual
time=25337.593..25337.594 rows=1 loops=1)
   Buffers: shared hit=2976790 read=152188
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..25.54 rows=1 width=4) (actual
time=25337.579..25337.587 rows=1 loops=1)
           Buffers: shared hit=2976790 read=152188
           ->  Index Scan using legs_sc_id_idx on legs
(cost=0.43..361498.49 rows=14394 width=4) (actual
time=25337.578..25337.578 rows=1 loops=1)
                 Index Cond: (sc_id IS NOT NULL)
                 Filter: ((scdate >= 20160219) AND (scdate <= 20160221))
                 Rows Removed by Filter: 4068865
                 Buffers: shared hit=2976790 read=152188
 Planning time: 0.235 ms
 Execution time: 25337.620 ms
(12 rows)

Time: 25338.375 ms

There is an index on scdate,sc_id that (I would have thought) should
be ideal for this query but it's ignored.

sc_id has no null values - it's even _defined_ as NOT NULL. I've no
idea why the planner would think that it needs to use the sc_id index
on this query.

If I create an index on sc_id,scdate, that one is used (index-only
scan) and the query returns in 200ms or so.

http://explain.depesz.com/s/3qNC

=# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;

 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=7.32..7.33 rows=1 width=0) (actual
time=207.194..207.194 rows=1 loops=1)
   Buffers: shared hit=1 read=11120
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..7.32 rows=1 width=4) (actual
time=207.187..207.187 rows=1 loops=1)
           Buffers: shared hit=1 read=11120
           ->  Index Only Scan using legs_sc_id_scdate_idx on legs
(cost=0.43..99204.99 rows=14394 width=4) (actual time=207.185..207.185
rows=1 loops=1)
                 Index Cond: ((sc_id IS NOT NULL) AND (scdate >=
20160219) AND (scdate <= 20160221))
                 Heap Fetches: 0
                 Buffers: shared hit=1 read=11120
 Planning time: 0.236 ms
 Execution time: 207.223 ms

I'm utterly at a loss. There are only 427 distinct scdate values on
this table, but 4 million sc_id values (and the spread across scdate
is reasonably similar - between 6000 and 11000 for each), so using an
index on (just) sc_id makes absolutely no sense (I would expect it to
be slower than a tablescan, no?). I also don't see how sc_id,scdate is
more useful than scdate,sc_id.

Have I completely misunderstood how this is all meant to work? I tried
reading the documentation around understanding EXPLAIN and the slow
query questions in the FAQ/Wiki but what I read didn't really seem to
suggest any investigative steps other than "RUN ANALYZE and VACUUM" -
is there a good doc on how to go about debugging this kind of thing?
Or even on how the planner makes its decisions?

I'm currently at the point where I'm just throwing random indexes at
tables in the vain hope that it might help. I'm fairly sure that
that's suboptimal :)

As before, pg9.5.1, CentOS 6 x64, 4GB RAM, Xeon X3220.
effective_cache_size is set to 3GB (but changing it wildly up or down
doesn't change anything), shared_buffers is 1GB, work_mem is 5242kB
(but changing to anything up to 1GB makes no difference).

Thanks

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general