On 15 Září 2011, 11:07, Yang Zhang wrote: > On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra <tv@xxxxxxxx> wrote: >> On 15 Září 2011, 9:53, Yang Zhang wrote: >>> I have a simple query that's been running for a while, which is fine, >>> but it seems to be running very slowly, which is a problem: >>> >>> mydb=# explain select user_id from den where user_id not in (select >>> duid from user_mappings) and timestamp between '2009-04-01' and >>> '2010-04-01'; >>> >>> QUERY PLAN >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) >>> Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without >>> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp >>> without time zone) AND (NOT (SubPlan 1))) >>> SubPlan 1 >>> -> Materialize (cost=711.58..1223.38 rows=36780 width=4) >>> -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 >>> width=4) >>> >>> user_mappings is fairly small: >> >> The problem is that for each of the 22634720 rows in "den" a separate >> uncorrelated subquery (a seq scan on user_mappings) has to be executed. >> Althogh the subquery is not very expensive, multiplied by the number of >> rows in "den" the total cost is extreme. >> >> The only solution is to get rid of the "not in" subquery - try to turn >> it >> to a join like this: >> >> SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid) >> WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01') >> AND (duid IS NULL) >> >> That should give the same result I guess. > > This worked great, thank you. Too bad the planner isn't smart enough > to do this yet! It's probably a bit more complicated I guess - there are probably cases when this would not work this great. Anyway, you should consider Toby Corkindale's recommendation and check if there's an index on that timestamp column - that might further improve the performance. (I don't think an index on user_id might be useful in this case). Tomas -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general