On Sat, Jun 8, 2013 at 8:06 AM, Enrico Pirozzi <sscotty71@xxxxxxxxx> wrote:
Hi all,
I have 2 tables:
table1 (field1,field2,.....)
table2 (field1,field2,field3,.....)
field1 is an uuid typeIs it a primary key? Is there a foreign key constraint between them?field2 is a timestamp with time zone type.
If I execute:
explain (analyze on, timing off)
select B.field3,A.field1,A.field2
FROM table1 A INNER JOIN table2 B
using (field1,field2)
the query plan is
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Hash Join (cost=137324.20..247200.77 rows=1 width=31) (actual rows=1136175 loops=1)
Hash Cond: ((b.field1 = a.field1) AND (b.field2 = a.field2))
-> Seq Scan on table2 b (cost=0.00..49694.75 rows=1136175 width=31) (actual rows=1136175 loops
=1)
-> Hash (cost=89610.68..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1)
Buckets: 65536 Batches: 4 Memory Usage: 15699kB
-> Seq Scan on table1 a (cost=0.00..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1
)
Total runtime: 5055.118 ms
(7 rows)
My question is
Why Have I a rows=1 in the explain and rows=1136175 in the explain analyze?
Yes, that seems quite strange.
On further thought, that is not strange at all. You have two very selective join conditions, and the planner assumes they are independent, so that it can multiply the selectivities. But in reality they are completely (or almost completely) dependent. If the planner knew about cross column correlations, that might not even help as you can have complete statistical dependence without having correlation.
It seems unlikely to me that the timestamp belongs in both tables, since it's value seems to be completely dependent on the value of the UUID.
In any event, it is unlikely the planner would pick a different plan were it to correctly understand the selectivities, so no harm is done. Although it is easy to imagine similar queries where that would not be the case.
Cheers,
Jeff