On Tue, Mar 25, 2014 at 2:00 PM, Brian Crowell <brian@xxxxxxxxxx> wrote:
Hello, it's me, a Postgres n00b again. I'm dealing with a query that
scans a rather large table (94,000,000 tuples or so) and just picks
out certain rows and sums them:
select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0) as pl
from dbo._pl_data_cache_intraday dci
where dci.snapshot_time between '2014-03-25
11:32:40.004552-05'::timestamptz and '2014-03-25
12:02:40.015177-05'::timestamptz
and dci.symbol in (select sec.symbol from dbo.security_underliers
sec where sec.ultimate_underlier = 'SPY')
and dci.manager = 'BJC'
and dci.account in (select account from pl2.visible_accounts where is_fund)
group by dci.snapshot_time
order by dci.snapshot_time;
For the most part, Postgres is doing the right thing: snapshot_time is
the lead column in all of the table's indexes, so it's able to pick up
the source rows fairly quickly in its index scan. It's also enforcing
"dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for
"dci.symbol in (...)".
The trouble comes when enforcing the "dci.account in (...)" search
condition: pl2.visible_accounts is a view that determines which
accounts the current user can see, which, depending on who you are,
can be several hundred or none at all. Postgres estimates the output
of this query as two rows, but in my case, it's actually 240.
Unfortunately, that leads the query planner to try to think a nested
loop is cheap enough to enforce this, when actually it's really
expensive.
Can you show the explain plan for that? I can't get it to use anything but a hash join for this type of thing even when the estimated rows in the in-list are 2, unless I disable hash joins altogether. So I'm curious how your plan differs from the ones I've dummied up.
Cheers,
Jeff