Hi all, If anyone still interested in the issue I think I have a very plausible explanation of Eric’s postgresql bad index choice that is: bloated updates_driver_id_time_idx index. Though it’s possible to fool postgresql planner, as I’ve shown in previous tests, this happens with a very concrete data distribution ~ 100 evenly distributed keys over 15M records and ~ 6M records under one single key, if you play a bit with figures it doesn’t happen anymore. Eric’s data distribution wasn’t that extreme, as far as I know he had ~ 100K vs 500K distributions … Well, I’ve been able to reproduce the problem with a close data distribution to Erik’s. I made it creating a ‘bloated index’. If optimal index is too big, postgres tries with another suboptimal index, in this case index ’time’. See this excerpt of my tests results: (..) -- populate table with 99 homogeneus distributed values INSERT INTO updates SELECT q, q % 99, q, q, to_timestamp(q), q % 99 FROM generate_series(1, 15000000) q; INSERT 0 15000000 Time: 65686,547 ms -- populate table with 1 value with 500K rows, simmilar distribution you posted. INSERT INTO updates SELECT q + 15000000, 100, q, q, to_timestamp(q), -- timestamp will start at 1 at end at 6M 100 FROM generate_series(1, 500000) q; INSERT 0 500000 Time: 2463,073 ms -- add constraints and indexes (…) -- create 'bloated' driver_id, time index. CREATE INDEX ON updates (driver_id, "time") WITH (fillfactor = 10); CREATE INDEX Time: 41234,091 ms -- check index sizes, updates_driver_id_idx is huge. SELECT relname, relpages FROM pg_class WHERE relname LIKE 'updates%'; relname | relpages -----------------------------+---------- updates | 129167 updates_driver_id_time_idx | 576919 updates_id_seq | 1 updates_pkey | 42502 updates_time_idx | 42502 updates_vehicle_id_time_idx | 59684 (6 rows) Time: 16,810 ms -- check behavior with bloated index ANALYZE updates; ANALYZE Time: 254,917 ms (..) Time: 4,635 ms EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM updates WHERE driver_id = 100 ORDER BY "time" DESC LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.43..1.91 rows=1 width=36) (actual time=21486.015..21486.015 rows=1 loops=1) Buffers: shared hit=39618 read=160454 written=592 -> Index Scan Backward using updates_time_idx on updates (cost=0.43..691283.45 rows=469134 width=36) (actual time=21486.014..21486.014 rows=1 loops=1) Filter: (driver_id = 100) Rows Removed by Filter: 14500000 Buffers: shared hit=39618 read=160454 written=592 Planning time: 0.171 ms Execution time: 21486.068 ms (8 rows) Time: 21486,905 ms -- rebuild index with default fillfactor ALTER INDEX updates_driver_id_time_idx SET (fillfactor = 90); ALTER INDEX Time: 0,682 ms REINDEX INDEX updates_driver_id_time_idx; REINDEX Time: 23559,530 ms -- recheck index sizes, updates_driver_id_idx should look pretty simmilar to others. SELECT relname, relpages FROM pg_class WHERE relname LIKE 'updates%'; relname | relpages -----------------------------+---------- updates | 129167 updates_driver_id_time_idx | 59684 updates_id_seq | 1 updates_pkey | 42502 updates_time_idx | 42502 updates_vehicle_id_time_idx | 59684 (6 rows) Time: 0,452 ms -- check behavior with regular sized index EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM updates WHERE driver_id = 100 ORDER BY "time" DESC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..1.69 rows=1 width=36) (actual time=0.032..0.033 rows=1 loops=1) Buffers: shared hit=2 read=3 -> Index Scan Backward using updates_driver_id_time_idx on updates (cost=0.56..529197.34 rows=469133 width=36) (actual time=0.032..0.032 rows=1 loops=1) Index Cond: (driver_id = 100) Buffers: shared hit=2 read=3 Planning time: 0.074 ms Execution time: 0.046 ms (7 rows) Time: 0,312 ms @Eric How to solve the problem: First of all check if this is the case, check indexes sizes, if you have this problem updates_driver_id_time_idx should be significantly bigger than others. SELECT relname, relpages FROM pg_class WHERE relname LIKE 'updates%’; Check index configuration to see if you have different fillfactor configuration \d+ updates_driver_id_time_idx If you have setup a different fillfactor, turn it to normal, that is 90%. I don’t see why you should have a low fillfactor, your data doesn’t seem to have frecuent updates, by the contrary, it seems only write and read data. ALTER INDEX updates_driver_id_time_idx SET (fillfactor = 90) If your index fillfactor is normal, there is a chance it got bloated, but this is rare. Reindex your data. REINDEX INDEX updates_driver_id_time_idx; Run tests again. Regards, Daniel Blanch. P.S. Here you have my full tests and output, you might find them useful. Don’t forget to show us index sizes and index configuration, please. |
Attachment:
test_bad_index_choice_bloated_index.sql
Description: Binary data
Attachment:
test_bad_index_choice_bloated_index.out
Description: Binary data
|