Hi all, Thomas is absolutely right, the distribution I synthetically made, had 6M records but very old, 9M old, as you can see it had to skip 9M records before finding a suitable record using time index. EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM updates WHERE driver_id = 100 ORDER BY "time" DESC LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.44..0.65 rows=1 width=36) (actual time=3827.807..3827.807 rows=1 loops=1) Buffers: shared hit=24592 read=99594 written=659 -> Index Scan Backward using updates_time_idx on updates (cost=0.44..1284780.53 rows=6064800 width=36) (actual time=3827.805..3827.805 rows=1 loops=1) Filter: (driver_id = 100) Rows Removed by Filter: 9000000 Buffers: shared hit=24592 read=99594 written=659 Planning time: 0.159 ms Execution time: 3827.846 ms (8 rows) Here you have my tests where I was able to reproduce the problem using default settings on 9.6, 9.5 and 9.3. 9.6 and 9.5 choose the wrong index, while 9.3 didn’t. (update: 9.5 didn’t fail last time)
Attachment:
test_bad_index_choice.sql
Description: Binary data
Attachment:
bad_idx_choice.9.6.out
Description: Binary data
Attachment:
bad_idx_choice.9.5.out
Description: Binary data
Attachment:
bad_idx_choice.9.3.out
Description: Binary data
However when I tried to add more than one value with this strange distribution ~ 30% of distribution to one value the index bad choice problem didn’t happen again in none of the different versions. I Hope this helps. Regards, Daniel Blanch. > El 10 dic 2016, a las 21:34, Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> escribió: > > Hi, > > On 12/10/2016 12:51 AM, Tom Lane wrote: >> Eric Jiang <eric@xxxxxxxxxxxxx> writes: >>> I have a query that I *think* should use a multicolumn index, but >>> sometimes isn't, resulting in slow queries. >> >> I tried to duplicate this behavior, without success. Are you running >> with nondefault planner parameters? >> > > My guess is this is a case of LIMIT the matching rows are uniformly distributed in the input data. The planner likely concludes that for a driver with a lot of data we'll find the first row using ix_updates_time very quickly, and that it will be cheaper than inspecting the larger multi-column index. But imagine a driver with a lots of data long time ago. That breaks the LIMIT fairly quickly. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance