Internal operations when the planner makes a hash join.

negora <negora@xxxxxxxxxx> · Tue, 23 Feb 2010 15:38:13 +0100

Hello:

I'm an ignorant in what refers to performance analysis of PostgreSQL. 
I've a doubt about how the PostgreSQL planner makes a hash join. I've 
tried to "dig" into the archive of this mailing list but I haven't found 
what I'm looking for. So I'm explaining my doubt with an example to see 
if anyone can help me.

Let's suppose that I've 2 tables, one of students and the other one of 
parents in a many-to-one relation. I want to do something like this:

       SELECT s.complete_name, f.complete_name
       FROM students AS s
       JOIN fathers AS f ON f.id_father = s.id_father;

Using the ANALYZE command, I've checked that the planner firstly scans 
and extracts the required information from "fathers", builds a temporary 
hash table from it, then scans "students", and finally joins the 
information from this table and the temporary one employing the relation 
"f.id_father = s.id_father".

My doubt is about this last step. When the planner checks the temporary 
table looking for the parent of a student:

A) Does it run through the temporary father's table one time per 
student? This means that if there are 500 students, it's doing 500 loops 
on the temporary table.
B) Or does it try to internally group students with the same father ID 
to avoid doing "absurd" loops on the temporary one?

That's all. Thank you very much for your kindness :) .

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance