Search Postgresql Archives

Re: PG choosing nested loop for set membership?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brian Crowell wrote
> Hello, it's me, a Postgres n00b again. I'm dealing with a query that
> scans a rather large table (94,000,000 tuples or so) and just picks
> out certain rows and sums them:
> 
> select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0)
> as pl
> from dbo._pl_data_cache_intraday dci
> where dci.snapshot_time between '2014-03-25
> 11:32:40.004552-05'::timestamptz and '2014-03-25
> 12:02:40.015177-05'::timestamptz
>     and dci.symbol in (select sec.symbol from dbo.security_underliers
> sec where sec.ultimate_underlier = 'SPY')
>     and dci.manager = 'BJC'
>     and dci.account in (select account from pl2.visible_accounts where
> is_fund)
> group by dci.snapshot_time
> order by dci.snapshot_time;
> 
> For the most part, Postgres is doing the right thing: snapshot_time is
> the lead column in all of the table's indexes, so it's able to pick up
> the source rows fairly quickly in its index scan. It's also enforcing
> "dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for
> "dci.symbol in (...)".
> 
> The trouble comes when enforcing the "dci.account in (...)" search
> condition: pl2.visible_accounts is a view that determines which
> accounts the current user can see, which, depending on who you are,
> can be several hundred or none at all. Postgres estimates the output
> of this query as two rows, but in my case, it's actually 240.
> 
> Unfortunately, that leads the query planner to try to think a nested
> loop is cheap enough to enforce this, when actually it's really
> expensive.
> 
> If I hard-code the results from pl2.visible_accounts, Postgres will do
> a hash semi join for me, which is much faster, but then I have to wrap
> up this whole query as a function in order to preserve its security
> properties. Not only is that the situation I was trying to avoid, it
> means I can't use EXPLAIN for my query anymore.
> 
> I've noticed I can also do the really sneaky "dci.account in (select
> unnest(array_agg(account)) from pl2.visible_accounts)", which tricks
> the estimator into thinking there will be 100 rows. That _really_
> feels like cheating.
> 
> Besides the above, is there anything I can do to get Postgres to do a
> hash instead of a nested loop?

1) Try using EXISTS instead of IN
2 - and the one I'd use by default) Use an INNER JOIN

SELECT ...
FROM ... dci 
JOIN (SELECT account FROM ... WHERE is_fund) accts USING (account)
JOIN (SELECT symbol FROM ... WHERE ... = 'SPY') sec USING (symbol)
WHERE ...

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/PG-choosing-nested-loop-for-set-membership-tp5797457p5797459.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux