Hi Tom,
I have now completed the move to PG8.0.3, and feel that I have confirmed
that this problem is related to the problem I'm having:
Formulated like this, it is not performing:
SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
FROM temp_dry_at_2m a
FULL OUTER JOIN temp_grass b
USING (station_id, timeobs)
WHERE station_id = 52981
AND timeobs = '2004-1-1 0:0:0';
Merge Full Join (cost=1598312.83..11032924.48 rows=6956994 width=32) (actual time=119061.098..133314.306 rows=1 loops=1)
Merge Cond: (("outer".timeobs = "inner".timeobs) AND ("outer".station_id = "inner".station_id))
Filter: ((COALESCE("inner".station_id, "outer".station_id) = 52981) AND (COALESCE("inner".timeobs, "outer".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone))
-> Sort (cost=346429.38..352445.11 rows=2406292 width=16) (actual time=20315.241..23850.529 rows=2406292 loops=1)
Sort Key: b.timeobs, b.station_id
-> Seq Scan on temp_grass b (cost=0.00..41756.92 rows=2406292 width=16) (actual time=10.517..7003.468 rows=2406292 loops=1)
-> Sort (cost=1251883.44..1269275.93 rows=6956994 width=16) (actual time=82122.354..92027.850 rows=6956994 loops=1)
Sort Key: a.timeobs, a.station_id
-> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 rows=6956994 width=16) (actual time=23.759..39930.741 rows=6956994 loops=1)
Total runtime: 133623.422 ms
But Postgresql can do the work, if it is reformulated into:
SELECT station_id, timeobs, temp_grass, temp_dry_at_2m
FROM
(SELECT station_id, timeobs, temp_dry_at_2m
FROM temp_dry_at_2m
WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') a
FULL OUTER JOIN
(SELECT station_id, timeobs, temp_grass
FROM temp_grass
WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') b
USING (station_id, timeobs)
Merge Full Join (cost=0.00..43023.64 rows=10614 width=32) (actual time=0.056..0.064 rows=1 loops=1)
-> Index Scan using temp_grass_idx on temp_grass (cost=0.00..246.55 rows=61 width=16) (actual time=0.029..0.031 rows=1 loops=1)
Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981))
-> Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m (cost=0.00..699.52 rows=174 width=16) (actual time=0.017..0.020 rows=1 loops=1)
Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981))
Total runtime: 0.163 ms
The reason the first query is not performing is because the query
optimizer does not push the conditions down into the sub-queries - right??
The reason that I do not just use the reformulated query, is that e.g.
the station_id comes from another table (and there can be more of them),
so it is bloody inconvenient to first select them, and then repeat them
a number of time in the above transformation (I need to outer join more
than two tables) ........
Best regards,
Kim Bisgaard wrote:
Hi Tom,
This sounds like the same "problem" which prevented PG from using the
indices, and thus giving abyssmal performance in this other thread:
I have two BIG tables (virtually identical) with 3 NOT NULL columns
Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id,
TimeObs) and valid ANALYSE (set statistics=100). I want to join the
two tables with a FULL OUTER JOIN.
When I specify the query as:
SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
FROM temp_dry_at_2m a
FULL OUTER JOIN temp_grass b USING (station_id, timeobs)
WHERE station_id = 52981
AND timeobs = '2004-1-1 0:0:0'
Then I would also vote for improving the inteligence of the optimizer!
:-)
Regards,
Kim.
Tom Lane wrote:
Phil Endecott <spam_from_postgresql_general@xxxxxxxxxxxx> writes:
I don't see anything in there about LEFT OUTER JOIN though. Any ideas?
Oh, I missed that part of your message. Hmm, I think the issue is
that in
D join (M join G on (M.g=G.id)) on (D.id=M.b) where D.id=nnn
the planner deduces M.b=nnn by transitivity, but when the join is an
outer join it can't make the same deduction.
[ thinks some more... ] If we distinguished conditions that hold below
the join from those that hold above it, we could deduce that M.b=nnn can
be enforced below the join even though it might not be true above it.
There's no such mechanism in existence now, though.
A possible workaround is to generate your query like
D left join (M join G on (M.g=G.id)) on (D.id=M.b AND M.b=nnn) where
D.id=nnn
but I don't know how practical that is for you.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend