Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.

Emil Iggland <emil.iggland@xxxxxxxxxxx> · Tue, 26 Apr 2022 09:41:52 +0200

> You've got the wrong column order (for this query anyway) in that
> index.  It'd work a lot better if dataview were the first column;
I might be misunderstanding you, but I assume that you are suggesting an 
index on (dataview, valuetimestamp).
We have that index, it is the primary key. For some reason it isn't 
being selected.

I can understand that it has to go through the whole index, potentially 
even the whole table, but I do not why it takes so long.

Even a query that should take equally long (probably longer) is 
substantially faster:

explain (analyze, buffers)
select valuetimestamp from datavalue
where valuetimestamp <> '1965-01-07 05:50:59';

Completes in less than 500ms using a sequential scan,

...
->  Seq Scan on datavalue_2022_04 datavalue_7  (cost=0.00..1450.39 
rows=56339 width=8) (actual time=0.013..5.988 rows=56109 loops=1)"
	Filter: (valuetimestamp <> '1965-01-07 05:50:59'::timestamp without 
time zone)
	Buffers: shared hit=742 read=4
...
Planning Time: 0.781 ms
Execution Time: 394.408 ms

while the original query takes over 1 second.
...
->  Index Scan Backward using 
datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04 
datavalue_7  (cost=0.29..4292.48 rows=56351 width=227) (actual 
time=0.166..17.340 rows=56109 loops=1)
	Buffers: shared hit=42013 read=278
...
Planning Time: 0.964 ms
Execution Time: 1291.509 ms

I do not understand how looking at every value in the index and 
returning none be slower than looking at every table in the table and 
returning none. If it takes 500ms to return every value in the table via 
a sequential scan, then it should take less via an index scan.

In case we never solve it, and someone else runs into similiar problems, 
we (hopefully temporarily) worked around it by reformulating the query 
to use a lateral join:

EXPLAIN (analyze, buffers)
SELECT dv.* FROM valueseries vs
LEFT JOIN LATERAL (
	SELECT * FROM datavalue dv WHERE dv.dataview = vs.id
	ORDER BY VALUETIMESTAMP
	FETCH FIRST 1 ROWS ONLY
) dv ON TRUE
where vs.channel = 752433

This causes it to use the correct index:
->  Index Scan using datavalue_2022_01_pkey on datavalue_2022_01 dv_4 
(cost=0.42..2951.17 rows=1032 width=228) (actual time=0.034..0.034 
rows=0 loops=1)
	Index Cond: (dataview = vs.id)
	Buffers: shared read=3
...
Planning Time: 1.169 ms
Execution Time: 0.524 ms

Regards
Emil

On 2022-04-25 18:00, Tom Lane wrote:
Emil Iggland <emil.iggland@xxxxxxxxxxx> writes:
The query that is giving us issues is the following, channel 752433 has
NO values, 752431 has values.
(Channel 752433 only has valueseries 752434)

select * from datavalue
where dataview in ( select id from valueseries where channel =
%channel_idx%)
ORDER BY VALUETIMESTAMP DESC
FETCH FIRST ROW only;

Running explain analyze shows strange numbers, 52'000 rows are being
returned but there are no rows there.

For channel 752433
->  Index Scan Backward using
datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03
datavalue_6  (cost=0.42..7166.19 rows=119673 width=226) (actual
time=0.008..32.831 rows=119601 loops=1)

You've got the wrong column order (for this query anyway) in that
index.  It'd work a lot better if dataview were the first column;
or at least, it wouldn't tempt the planner to try this unstably-
performing plan.  It's trying to use the index ordering to satisfy
the ORDER BY, which works great as long as it finds a dataview
match in some reasonably recent index entry.  Otherwise, it's
going to crawl the whole index to discover that there's no match.

			regards, tom lane