Re: Querying with multicolumn index

Daniel Blanch Bataller <daniel.blanch.bataller@xxxxxxxxx> · Sun, 11 Dec 2016 07:04:45 +0100

Hi all,

Thomas is absolutely right, the distribution I synthetically made, had 6M records but very old, 9M old, as you can see it had to skip 9M records before finding a suitable record using time index. 

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM updates WHERE driver_id = 100 ORDER BY "time" DESC LIMIT 1;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..0.65 rows=1 width=36) (actual time=3827.807..3827.807 rows=1 loops=1)
   Buffers: shared hit=24592 read=99594 written=659
   ->  Index Scan Backward using updates_time_idx on updates  (cost=0.44..1284780.53 rows=6064800 width=36) (actual time=3827.805..3827.805 rows=1 loops=1)
         Filter: (driver_id = 100)
         Rows Removed by Filter: 9000000
         Buffers: shared hit=24592 read=99594 written=659
 Planning time: 0.159 ms
 Execution time: 3827.846 ms
(8 rows)

Here you have my tests where I was able to reproduce the problem using default settings on 9.6, 9.5 and 9.3. 9.6 and 9.5 choose the wrong index, while 9.3 didn’t. (update: 9.5 didn’t fail last time) 

Attachment:
test_bad_index_choice.sql

Description: Binary data
Attachment:
bad_idx_choice.9.6.out

Description: Binary data
Attachment:
bad_idx_choice.9.5.out

Description: Binary data
Attachment:
bad_idx_choice.9.3.out

Description: Binary data

However when I tried to add more than one value with this strange distribution ~ 30% of distribution to one value the index bad choice problem didn’t happen again in none of the different versions.

I Hope this helps. Regards,

Daniel Blanch.

> El 10 dic 2016, a las 21:34, Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> escribió:
> 
> Hi,
> 
> On 12/10/2016 12:51 AM, Tom Lane wrote:
>> Eric Jiang <eric@xxxxxxxxxxxxx> writes:
>>> I have a query that I *think* should use a multicolumn index, but
>>> sometimes isn't, resulting in slow queries.
>> 
>> I tried to duplicate this behavior, without success.  Are you running
>> with nondefault planner parameters?
>> 
> 
> My guess is this is a case of LIMIT the matching rows are uniformly distributed in the input data. The planner likely concludes that for a driver with a lot of data we'll find the first row using ix_updates_time very quickly, and that it will be cheaper than inspecting the larger multi-column index. But imagine a driver with a lots of data long time ago. That breaks the LIMIT fairly quickly.
> 
> regards
> 
> -- 
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance