RE: select query does not pick up the right index

Abadie Lana <Lana.Abadie@xxxxxxxx> · Thu, 3 Jan 2019 12:57:27 +0000

Lana ABADIE
Database Engineer
CODAC Section

ITER Organization, Building 72/4108, SCOD, Control System Division
Route de Vinon-sur-Verdon - CS 90 046 - 13067 St Paul Lez Durance Cedex - France
Phone: +33 4 42 17 84 02
Get the latest ITER news on http://www.iter.org/whatsnew

-----Original Message-----
From: David Rowley <david.rowley@xxxxxxxxxxxxxxx> 
Sent: 03 January 2019 01:16
To: Abadie Lana <Lana.Abadie@xxxxxxxx>
Cc: pgsql-performance@xxxxxxxxxxxxxxxxxxxx
Subject: Re: select query does not pick up the right index

On Thu, 3 Jan 2019 at 05:28, Abadie Lana <Lana.Abadie@xxxxxxxx> wrote:
> I would appreciate any hints as this problem looks to me rather strange…I tried to google it but in vain.
>
> select t.name, c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c, channel t where t.channel_id=c.channel_id and t.name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW'   order by c.smpl_time desc limit 5;
>
> takes 20mn to execute because it picks up the wrong index…see explain analyse below. I would expect this query to use the (channel_id,smpl_time) but it uses the smpl_time index.

[...]

> Any ideas, why the planner is not taking the right index?

The planner assumes that the required channel values are evenly distributed through the scan of the index on smpl_time.  If your required 5 rows were found quickly (i.e channels with recent sample values), then the plan would have worked out well.  It looks like 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW' is probably a channel which has some very old sample values. I can see that from "Rows Removed by Join
Filter: 322099471",  meaning that on backwards scanning the smpl_time index, that many rows were found not to match the channel you requested.

The planner, by default only has statistics to say how common each channel is in the sample table. I think in this case since the planner has no knowledge of which channel_id it will be searching for (that's only determined during execution), then I suppose it must be using the n_distinct of the sample.channel_id table.  It would be interesting to know how far off the n_distinct estimation is. You can find out with:

select stadistinct from pg_statistic where starelid='sample'::regclass and staattnum = 1; select count(*) from (select distinct channel_id from sample) s; -- this may take a while to run...

If the stadistinct estimate is far out from the reality, then you could consider setting this manually with:

alter table sample alter column channel_id set (n_distinct = <actual value here>);

but keep in mind, that as the table evolves, whatever you set there could become outdated.

Another method to fix you could try would be to coax the planner into doing something different would be to give it a better index to work with.

create index on channel(name, channel_id);

You didn't show us the details from the channel table, but if there's not an index like this then this might reduce the cost of a Merge Join, but since the order rows output from that join would be in channel_id order, a Sort would be required, which would require joining all matching rows, not just the first 5 matches. Depending on how many rows actually match will determine if that's faster or not.

If you don't have luck with either of the above then, one other thing you could try would be to disallow the planner from using the smpl_time index by changing the order by to "ORDER BY c.smpl_time +
INTERVAL '0 sec';   that's a bit of a hack, but we don't have anything
we officially call "query hints" in PostgreSQL, so often we're left to solve issues like this with ugly tricks like that.

Also, going by:

> ->  Seq Scan on channel t  (cost=0.00..915.83 rows=1 width=41) (actual 
> -> time=4.683..7.885 rows=1 loops=1)

perhaps "name" is unique on the channel table?  (I doubt there's an index/constraint to back that up, however, since such an index would have likely been used here instead of the Seq Scan)

If so, and you can add a constraint to back that up, you might  be able to reform the query to be:

select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',
c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val
from sample c
WHERE c.channel_id = (SELECT channel_id FROM channel WHERE
name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW')
order by c.smpl_time desc limit 5;

If you can do that then it's highly likely to be *very* fast to execute since I see there's an index on (channel_id, smpl_time) on each of the inherited tables.

(If our planner was smarter then in the presence of the correct unique index, we could have rewritten the query as such automatically.... but it's not / we don't.  I believe I've mentioned about improving this somewhere in the distant past of the -hackers mailing list, although I can't find it right now. I recall naming the idea "scalar value lookup joins", but development didn't get much beyond thinking of that name)

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Hi David
Thanks for your tips. So answers to your questions/comments
1) About n_distinct
first query returns 2136, second query returns 33425. So it seems that there is some discrepancies...Also the sample table is very big...roughly 322099474 rows.
I did the alter statement but without success. Still long execution time with wrong index
2) index on channel(name,channel_id)
There was no indexes on channel. So I created it. Same execution time, still wrong index used regardless of the n_distinct values
3)The "trick" (+ interval '0s') did the job. The index on channel_id, smpl_time is used. Query time can vary between a few ms to 25 sec
4) name is unique, constraint and index created. Right index is picked up and query time is rather constant there 40sec.

A few comments : 
- I have disabled force_parallel_mode when running all the tests. 
- The difference between the two plans is in the case of query with the trick, the planner is using a bitmap index scan, in the second one it uses index scan backward.
- when I execute the initial query, there is a big read access on disk almost 17.7 GB...whereas the total size of the smpl_time index is roughly 7GB...Could it be a wrong configuration on my side?
During the tests, no insert/delete/or update was performed...only my select queries...
Main parameters : effective_cache_size : 4GB, shared_buffers 4GB, work_mem 4MB

Thanks a lot !