Re: hash join vs nested loop join

"Kevin Grittner" <kgrittn@xxxxxxxx> · Thu, 20 Dec 2012 09:06:29 -0500

Huan Ruan wrote:
> Kevin Grittner wrote:

>> Frankly, at 12 microseconds per matched pair of rows, I think
>> you're doing OK.
>
> This plan is the good one, I want the indexscan nested loop join and this
> is only achieved after making all these costing factors change. Before
> that, it was hash join and was very slow.
> 
> However, I'm worried about the config changes being too 'extreme', i.e.
> both sequential I/O and random I/O have the same cost and being only 0.1.
> So, I was more wondering why I have to make such dramatic changes to
> convince the optimiser to use NL join instead of hash join. And also, I'm
> not sure what impact will these changes have on other queries yet. e.g.
> will a query that's fine with hash join now choose NL join and runs slower?

I understand the concern, but PostgreSQL doesn't yet have a knob to
turn for "cache hit ratio". You essentially need to build that into
the page costs. Since your cache hit ratio (between shared buffers
and the OS) is so high, the cost of page access relative to CPU
costs has declined and there isn't any effective difference between
sequential and random access. As the level of caching changes, you
may need to adjust. In one production environment where there was
significant caching, but far enough from 100% to matter, we tested
various configurations and found the fastest plans being chosen
with seq_page_cost = 0.3 and random_page_cost = 0.5. Tune to your
workload.

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance