Brian Crowell wrote > Hello, it's me, a Postgres n00b again. I'm dealing with a query that > scans a rather large table (94,000,000 tuples or so) and just picks > out certain rows and sums them: > > select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0) > as pl > from dbo._pl_data_cache_intraday dci > where dci.snapshot_time between '2014-03-25 > 11:32:40.004552-05'::timestamptz and '2014-03-25 > 12:02:40.015177-05'::timestamptz > and dci.symbol in (select sec.symbol from dbo.security_underliers > sec where sec.ultimate_underlier = 'SPY') > and dci.manager = 'BJC' > and dci.account in (select account from pl2.visible_accounts where > is_fund) > group by dci.snapshot_time > order by dci.snapshot_time; > > For the most part, Postgres is doing the right thing: snapshot_time is > the lead column in all of the table's indexes, so it's able to pick up > the source rows fairly quickly in its index scan. It's also enforcing > "dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for > "dci.symbol in (...)". > > The trouble comes when enforcing the "dci.account in (...)" search > condition: pl2.visible_accounts is a view that determines which > accounts the current user can see, which, depending on who you are, > can be several hundred or none at all. Postgres estimates the output > of this query as two rows, but in my case, it's actually 240. > > Unfortunately, that leads the query planner to try to think a nested > loop is cheap enough to enforce this, when actually it's really > expensive. > > If I hard-code the results from pl2.visible_accounts, Postgres will do > a hash semi join for me, which is much faster, but then I have to wrap > up this whole query as a function in order to preserve its security > properties. Not only is that the situation I was trying to avoid, it > means I can't use EXPLAIN for my query anymore. > > I've noticed I can also do the really sneaky "dci.account in (select > unnest(array_agg(account)) from pl2.visible_accounts)", which tricks > the estimator into thinking there will be 100 rows. That _really_ > feels like cheating. > > Besides the above, is there anything I can do to get Postgres to do a > hash instead of a nested loop? 1) Try using EXISTS instead of IN 2 - and the one I'd use by default) Use an INNER JOIN SELECT ... FROM ... dci JOIN (SELECT account FROM ... WHERE is_fund) accts USING (account) JOIN (SELECT symbol FROM ... WHERE ... = 'SPY') sec USING (symbol) WHERE ... David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/PG-choosing-nested-loop-for-set-membership-tp5797457p5797459.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general