Re: Hash Join performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, Mar 13, 2009 at 7:08 PM, Tom Lane wrote:
Vamsidhar Thummala writes:
> I am wondering why are we subtracting the entire Seq Scan time of Lineitem
> from the total time to calculate the HashJoin time.

Well, if you're trying to identify the speed of the join itself and not
how long it takes to provide the input for it, that seems like a
sensible calculation to make.

I am still not clear on this. I am thinking the output is produced in a pipelined fashion i.e., as soon as the record of outer child is read (sequentially here) and if HashJoin finds a match by probing the inner hash table (in memory), we have an output record. Please correct if I am wrong here.
 


> Here is another plan I have for the same TPC-H 18 query with different
> configuration parameters (shared_buffers set to 400MB, just for experimental
> purposes) and HashJoin seems to take longer time (at least 155.58s based on
> above calculation):

Yeah, that seems to work out to about 25us per row instead of 3us, which
is a lot slower.  Maybe the hash got split up into multiple batches ...
what have you got work_mem set to?  Try turning on log_temp_files and
see if it records any temp files as getting created.

Unfortunately, I am working with Postgres 8.2 which doesn't have log_temp_files. The work_mem is still at 1MB (all other parameters were kept constant apart from shared_buffers w.r.t previous configuration). The hash is build on 57 records (~20kb, customer row length is 179 bytes and orders row length is 104 bytes) produced by inner subplan and so I will be surprised if multiple batches are created.

Thank you.

Regards,
-Vamsi

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux