Kevin Goess wrote: > We have a table "contexts" with 1.6 million rows, and a table "articles" with 1.4 million rows, where > an "article" is a particular kind of "context". We want to select from a join on those two tables > like this > > SELECT COUNT(*) > FROM contexts > JOIN articles ON (articles.context_key=contexts.context_key) > WHERE contexts.context_key IN (...); > /* and some combination of columns from articles and contexts */ > > If "IN(...)" is a query, then this guy does a seq scan on the contexts table, even if the subquery is > "select col_a from kgtest" where kgtest has one row. If however I read the ids beforehand and write > them into the query, a la "IN (111,222,333...)", then the everything is happy, up to at least 20,000 > values written into the sql, at which point smaller machines will take 2-5 minutes to parse the query. > > I can certainly write the ids inline into the SQL, but when I do that I get the distinct impression > that I'm Doing It Wrong. Is this expected behavior? It seems surprising to me. > > > To demonstrate: > > /* nothing up my sleeve */ > # select * from kgtest; > cola > --------- > 1652729 > (1 row) [...] > /* subselect, query plan does seq scan on contexts */ [...] > -> Seq Scan on kgtest (cost=0.00..34.00 rows=2400 width=4) (actual time=0.048..0.050 rows [...] There is something missing in this line, but according to what you wrote it must be "actual [...] rows=1", And yet the planner assumes that the scan will return 2400 rows. That means that your statistics are not accurate. As a first measure, you should ANALYZE the tables involved and see if the problem persists. If yes, post the new plans. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general