Re: Interesting slow query

PFC <lists@xxxxxxxxxx> · Tue, 13 Jun 2006 13:56:47 +0200

Usually we get complaints the other way around (that the NOT EXISTS
approach is a lot slower).

	Yes, I know ;)
	(I rephrased the query this way to exploit the fact that the planner  
would choose a nested loop)

You did not show any statistics, but I
suspect the key point here is that the condition id > 1130306 excludes
most or all of the A and D tables.

	Right.

	Actually :
	- Table r (raw_annonces) contains raw data waiting to be processed
	- Table a (annonces) contains processed data ready for display on the  
website (active data)
	- Table d (archive) contains old archived data which can be displayed on  
request but is normally excluded from the searches, which normally only  
hit recent records. This is to get speedy searches.

	So, records are added into the "raw" table, these have a SERIAL primary  
key.
	Then a script processes them and inserts the results into the active  
table. 15 days of "raw" records are kept, then they are deleted.
	Periodically old records from "annonces" are moved to the archive.

	The promary key stays the same in the 3 tables.

	The script knows at which id it stopped last time it was run, hence the  
(id > x) condition. Normally this excludes the entire "annonces" table,  
because we process only new records.

The planner is not smart about
making transitive inequality deductions, but you could help it along
by adding the implied clauses yourself:

EXPLAIN ANALYZE SELECT r.* FROM raw_annonces r LEFT JOIN annonces a ON  
(a.id=r.id AND a.id > 1180726) LEFT JOIN archive_data d ON (d.id=r.id AND  
d.id > 1180726) WHERE a.id IS NULL AND d.id IS NULL AND r.id > 1180726  
order by id limit 1;

Whether this is worth doing in your app depends on how often you do
searches at the end of the ID range ...

	Quite often actually, so I did the mod.
	The interesting part is that, yesterday after ANALYZE the query plan was  
horrible, and today, after adding new data I ANALYZED and retried the slow  
query, and it was fast again :

EXPLAIN ANALYZE SELECT r.* FROM raw_annonces r LEFT JOIN annonces a ON  
(a.id=r.id) LEFT JOIN archive_data d ON (d.id=r.id) WHERE a.id IS NULL AND  
d.id IS NULL AND r.id > 1180726 order by id limit 1;
                                                                        QUERY  
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..10.42 rows=1 width=631) (actual time=0.076..0.076  
rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..7129.11 rows=684 width=631)  
(actual time=0.074..0.074 rows=1 loops=1)
         Filter: ("inner".id IS NULL)
         ->  Nested Loop Left Join  (cost=0.00..4608.71 rows=684  
width=631) (actual time=0.064..0.064 rows=1 loops=1)
               Filter: ("inner".id IS NULL)
               ->  Index Scan using raw_annonces_pkey on raw_annonces r   
(cost=0.00..667.56 rows=684 width=631) (actual time=0.013..0.013 rows=1  
loops=1)
                     Index Cond: (id > 1180726)
               ->  Index Scan using annonces_pkey on annonces a   
(cost=0.00..5.75 rows=1 width=4) (actual time=0.046..0.046 rows=0 loops=1)
                     Index Cond: (a.id = "outer".id)
         ->  Index Scan using archive_data_pkey on archive_data d   
(cost=0.00..3.67 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1)
               Index Cond: (d.id = "outer".id)
 Total runtime: 0.197 ms

	So I did a few tests...

CREATE TABLE test.raw (id INTEGER PRIMARY KEY);
CREATE TABLE test.active (id INTEGER PRIMARY KEY);
CREATE TABLE test.archive (id INTEGER PRIMARY KEY);
INSERT INTO test.archive SELECT * FROM generate_series( 1, 1000000 );
INSERT INTO test.active SELECT * FROM generate_series( 1000001, 1100000 );
INSERT INTO test.raw SELECT * FROM generate_series( 1050000, 1101000 );
VACUUM ANALYZE;

So we have 1M archived records, 100K active, 51K in the "raw" table of  
which 1000 are new.

	Query 1:

EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS  
active ON (active.id=raw.id) LEFT JOIN test.archive AS archive ON  
(archive.id=raw.id) WHERE raw.id>1100000 AND active.id IS NULL AND  
archive.id IS NULL LIMIT 1;
                                                                     QUERY  
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..5.29 rows=1 width=12) (actual time=94.478..94.478  
rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..5400.09 rows=1021 width=12)  
(actual time=94.477..94.477 rows=1 loops=1)
         Filter: ("inner".id IS NULL)
         ->  Merge Left Join  (cost=0.00..2310.55 rows=1021 width=8)  
(actual time=94.458..94.458 rows=1 loops=1)
               Merge Cond: ("outer".id = "inner".id)
               Filter: ("inner".id IS NULL)
               ->  Index Scan using raw_pkey on raw  (cost=0.00..24.78  
rows=1021 width=4) (actual time=0.016..0.016 rows=1 loops=1)
                     Index Cond: (id > 1100000)
               ->  Index Scan using active_pkey on active   
(cost=0.00..2023.00 rows=100000 width=4) (actual time=0.005..76.572  
rows=100000 loops=1)
         ->  Index Scan using archive_pkey on archive  (cost=0.00..3.01  
rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=1)
               Index Cond: (archive.id = "outer".id)
 Total runtime: 94.550 ms

	Query 2:

EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS  
active ON (active.id=raw.id AND active.id>1100000) LEFT JOIN test.archive  
AS archive ON (archive.id=raw.id AND archive.id > 1100000) WHERE  
raw.id>1100000 AND active.id IS NULL AND archive.id IS NULL LIMIT 1;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.04 rows=1 width=12) (actual time=0.035..0.035 rows=1  
loops=1)
   ->  Merge Left Join  (cost=0.00..37.67 rows=1021 width=12) (actual  
time=0.034..0.034 rows=1 loops=1)
         Merge Cond: ("outer".id = "inner".id)
         Filter: ("inner".id IS NULL)
         ->  Merge Left Join  (cost=0.00..30.51 rows=1021 width=8) (actual  
time=0.026..0.026 rows=1 loops=1)
               Merge Cond: ("outer".id = "inner".id)
               Filter: ("inner".id IS NULL)
               ->  Index Scan using raw_pkey on raw  (cost=0.00..24.78  
rows=1021 width=4) (actual time=0.016..0.016 rows=1 loops=1)
                     Index Cond: (id > 1100000)
               ->  Index Scan using active_pkey on active   
(cost=0.00..3.14 rows=10 width=4) (actual time=0.006..0.006 rows=0 loops=1)
                     Index Cond: (id > 1100000)
         ->  Index Scan using archive_pkey on archive  (cost=0.00..4.35  
rows=100 width=4) (actual time=0.007..0.007 rows=0 loops=1)
               Index Cond: (id > 1100000)
 Total runtime: 0.101 ms

	OK, you were right ;)

	Query 3:

EXPLAIN ANALYZE SELECT * FROM test.raw AS raw WHERE raw.id > 1100000 AND  
NOT EXISTS (SELECT 1 FROM test.active AS a WHERE a.id=raw.id) AND NOT  
EXISTS (SELECT 1 FROM test.archive AS a WHERE a.id=raw.id) LIMIT 1;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..24.23 rows=1 width=4) (actual time=0.036..0.036 rows=1  
loops=1)
   ->  Index Scan using raw_pkey on raw  (cost=0.00..6178.35 rows=255  
width=4) (actual time=0.035..0.035 rows=1 loops=1)
         Index Cond: (id > 1100000)
         Filter: ((NOT (subplan)) AND (NOT (subplan)))
         SubPlan
           ->  Index Scan using archive_pkey on archive a   
(cost=0.00..3.01 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
                 Index Cond: (id = $0)
           ->  Index Scan using active_pkey on active a  (cost=0.00..3.01  
rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                 Index Cond: (id = $0)
 Total runtime: 0.086 ms

	I see a problem with Query 1:
	The Merge Join goes through tables "raw" and "active" in sorted order.
"archive" contains values 1-1000000
"active" contains values 1000001-1100000
"raw" contains values 1050000-1101000

	However it starts at the beginning of "active" ; it would be smarter to  
start the index scan of "active" at the lowest value in "raw", ie. to seek  
into the right position into the index before beginning to scan it. This  
is achieved by your advice on manually adding the "id > x" conditions in  
the query.

	However, if I want to join the full tables, dropping the id>x condition :

EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS  
active ON (active.id=raw.id) LEFT JOIN test.archive AS archive ON  
(archive.id=raw.id) WHERE active.id IS NULL AND archive.id IS NULL;
                                                                  QUERY  
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.00..27305.04 rows=51001 width=12) (actual  
time=837.196..838.099 rows=1000 loops=1)
   Merge Cond: ("outer".id = "inner".id)
   Filter: ("inner".id IS NULL)
   ->  Merge Left Join  (cost=0.00..3943.52 rows=51001 width=8) (actual  
time=153.495..154.190 rows=1000 loops=1)
         Merge Cond: ("outer".id = "inner".id)
         Filter: ("inner".id IS NULL)
         ->  Index Scan using raw_pkey on raw  (cost=0.00..1033.01  
rows=51001 width=4) (actual time=0.012..23.085 rows=51001 loops=1)
         ->  Index Scan using active_pkey on active  (cost=0.00..2023.00  
rows=100000 width=4) (actual time=0.004..47.333 rows=100000 loops=1)
   ->  Index Scan using archive_pkey on archive  (cost=0.00..20224.00  
rows=1000000 width=4) (actual time=0.043..501.953 rows=1000000 loops=1)
 Total runtime: 838.272 ms

	This is very slow : the Index Scans on "active" and "archive" have to  
skip a huge number of rows before getting to the first interesting row. We  
know that rows in "active" and "archive" will be of no use if their id is  
< (SELECT min(id) FROM test.raw) which is 1050000. Let's rephrase :

EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS  
active ON (active.id=raw.id AND active.id >= 1050000) LEFT JOIN  
test.archive AS archive ON (archive.id=raw.id AND archive.id >= 1050000)  
WHERE active.id IS NULL AND archive.id IS NULL;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.00..2837.93 rows=51001 width=12) (actual  
time=114.590..115.451 rows=1000 loops=1)
   Merge Cond: ("outer".id = "inner".id)
   Filter: ("inner".id IS NULL)
   ->  Merge Left Join  (cost=0.00..2705.78 rows=51001 width=8) (actual  
time=114.576..115.239 rows=1000 loops=1)
         Merge Cond: ("outer".id = "inner".id)
         Filter: ("inner".id IS NULL)
         ->  Index Scan using raw_pkey on raw  (cost=0.00..1033.01  
rows=51001 width=4) (actual time=0.012..51.505 rows=51001 loops=1)
         ->  Index Scan using active_pkey on active  (cost=0.00..1158.32  
rows=50913 width=4) (actual time=0.009..22.312 rows=50001 loops=1)
               Index Cond: (id >= 1050000)
   ->  Index Scan using archive_pkey on archive  (cost=0.00..4.35 rows=100  
width=4) (actual time=0.012..0.012 rows=0 loops=1)
         Index Cond: (id >= 1050000)
 Total runtime: 115.601 ms

	So here's my point : the first operation in the Index Scan in a merge  
join could be to seek to the right position in the index before scanning  
it. This value is known : it is the first value yielded by the index scan  
on "raw".

	This would remove the need for teaching the planner about transitivity,  
and also optimize this case where transitivity is useless.