Re: Strange choice of general index over partial index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/01/15 13:37, Mark Kirkwood wrote:
On 16/01/15 11:30, Josh Berkus wrote:
This is an obfuscation and mock up, but:

table files (
    id serial pk,
    filename text not null,
    state varchar(20) not null
    ... 18 more columns
)

index file_state on (state)
    (35GB in size)
index file_in_flight_state (state) where state in (
'waiting','assigning', 'processing' )
    (600MB in size)
... 10 more indexes

More important facts:
* state = 'done' 95% of the time.  thereform the partial index
represents only 5% of the table
* all indexes and the table are very bloated
* server has 128GB RAM
* Version 9.2.

Given this setup, I would expect the planner to *always* choose
file_in_flight_state over file_state for this query:

SELECT id, filename FROM files WHERE state = 'waiting';

... and yet it keeps selecting file_state based on extremely small
changes to the stats.   This is important because the same query, using
file_state, is 20X to 50X slower, because that index frequently gets
pushed out of memory.

What am I missing?  Or is this potentially a planner bug for costing?


Are you seeing a bitmapscan access plan? If so see if disabling it gets
you a plan on the files_in_flight index. I'm seeing this scenario with a
fake/generated dataset a bit like yours in 9.2 (9.5 uses the
files_in_flight w/o any coercing).


FWIW: For me 9.2 and 9.3 (default config) generate plans like:
state=# EXPLAIN ANALYZE
SELECT id, filename
FROM files
WHERE state = 'processing';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on files (cost=3102.02..89228.68 rows=164333 width=15) (actual time=26.629..803.507 rows=166696 loops=1)
   Recheck Cond: ((state)::text = 'processing'::text)
   Rows Removed by Index Recheck: 7714304
-> Bitmap Index Scan on file_state (cost=0.00..3060.93 rows=164333 width=0) (actual time=25.682..25.682 rows=166696 loops=1)
         Index Cond: ((state)::text = 'processing'::text)
 Total runtime: 808.662 ms
(6 rows)


whereas 9.4 and 9.5 get:

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
Index Scan using file_in_flight on files (cost=0.42..62857.39 rows=158330 width=15) (actual time=0.055..202.732 rows=166696 loops=1)
   Index Cond: ((state)::text = 'processing'::text)
 Planning time: 24.203 ms
 Execution time: 208.926 ms
(4 rows)


This is with each version loading exactly the same dataset (generated by the attached scripty). Obviously this is a vast simplification of what Josh is looking at - but it is (hopefully) interesting that these later versions are doing so much better...

Cheers

Mark

Attachment: gendata.pl
Description: Perl program

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux