On 16/01/15 13:37, Mark Kirkwood wrote:
On 16/01/15 11:30, Josh Berkus wrote:This is an obfuscation and mock up, but: table files ( id serial pk, filename text not null, state varchar(20) not null ... 18 more columns ) index file_state on (state) (35GB in size) index file_in_flight_state (state) where state in ( 'waiting','assigning', 'processing' ) (600MB in size) ... 10 more indexes More important facts: * state = 'done' 95% of the time. thereform the partial index represents only 5% of the table * all indexes and the table are very bloated * server has 128GB RAM * Version 9.2. Given this setup, I would expect the planner to *always* choose file_in_flight_state over file_state for this query: SELECT id, filename FROM files WHERE state = 'waiting'; ... and yet it keeps selecting file_state based on extremely small changes to the stats. This is important because the same query, using file_state, is 20X to 50X slower, because that index frequently gets pushed out of memory. What am I missing? Or is this potentially a planner bug for costing?Are you seeing a bitmapscan access plan? If so see if disabling it gets you a plan on the files_in_flight index. I'm seeing this scenario with a fake/generated dataset a bit like yours in 9.2 (9.5 uses the files_in_flight w/o any coercing).
FWIW: For me 9.2 and 9.3 (default config) generate plans like: state=# EXPLAIN ANALYZE SELECT id, filename FROM files WHERE state = 'processing';QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on files (cost=3102.02..89228.68 rows=164333 width=15) (actual time=26.629..803.507 rows=166696 loops=1)
Recheck Cond: ((state)::text = 'processing'::text) Rows Removed by Index Recheck: 7714304-> Bitmap Index Scan on file_state (cost=0.00..3060.93 rows=164333 width=0) (actual time=25.682..25.682 rows=166696 loops=1)
Index Cond: ((state)::text = 'processing'::text) Total runtime: 808.662 ms (6 rows) whereas 9.4 and 9.5 get:QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------Index Scan using file_in_flight on files (cost=0.42..62857.39 rows=158330 width=15) (actual time=0.055..202.732 rows=166696 loops=1)
Index Cond: ((state)::text = 'processing'::text) Planning time: 24.203 ms Execution time: 208.926 ms (4 rows)This is with each version loading exactly the same dataset (generated by the attached scripty). Obviously this is a vast simplification of what Josh is looking at - but it is (hopefully) interesting that these later versions are doing so much better...
Cheers Mark
Attachment:
gendata.pl
Description: Perl program
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance