2010/4/26 Vlad Arkhipov <arhipov@xxxxxxxxxxxx>: > >> On Thu, Apr 22, 2010 at 10:37 PM, Vlad Arkhipov <arhipov@xxxxxxxxxxxx> >> wrote: >> >>> >>> I don't think this is just an issue with statistics, because the same >>> problem arises when I try executing a query like this: >>> >> >> I'm not sure how you think this proves that it isn't a problem with >> statistics, but I think what you should be focusing on here, looking >> back to your original email, is that the plans that are actually much >> faster have almost as much estimated cost as the slower one. Since >> all your data is probably fully cached, at a first cut, I might try >> setting random_page_cost and seq_page_cost to 0.005 or so, and >> adjusting effective_cache_size to something appropriate. >> >> ...Robert >> >> > > Ok. I thougth it's quite obvious because of these two queries. I can't > understand why the estimated rows count is 40040 in the first plan. In the first query, the planner doesn't use the information of the 2,3,4. It just does a : I'll bet I'll have 2 rows in t1 (I think it should say 3, but it doesn't) So it divide the estimated number of rows in the t2 table by 5 (different values) and multiply by 2 (rows) : 40040. In the second query the planner use a different behavior : it did expand the value of t1.t to t2.t for each join relation and find a costless plan. (than the one using seqscan on t2) We are here in corner case situation where n_distinc valuest < statistics on the column and where we might be able to improve the planner decision. I believe I have already read something on this topic on -hackers... > > test=# explain analyze select * from t2 join t1 on t1.t = t2.t where > t1.t in (2,3,4); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------ > Hash Join (cost=1.09..2319.87 rows=40040 width=32) (actual > time=0.050..356.269 rows=400 loops=1) > Hash Cond: (t2.t = t1.t) > -> Seq Scan on t2 (cost=0.00..1543.00 rows=100100 width=16) (actual > time=0.013..176.087 rows=100100 loops=1) > -> Hash (cost=1.07..1.07 rows=2 width=16) (actual time=0.023..0.023 > rows=3 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 1kB > -> Seq Scan on t1 (cost=0.00..1.07 rows=2 width=16) (actual > time=0.006..0.014 rows=3 loops=1) > Filter: (t = ANY ('{2,3,4}'::bigint[])) > Total runtime: 356.971 ms > (8 rows) > > test=# explain analyze select * from t2 join t1 on t1.t = t2.t where > t1.t = 2 union all select * from t2 join t1 on t1.t = t2.t where t1.t = > 3 union all select * from t2 join t1 on t1.t = t2.t where t1.t = 4; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Append (cost=0.00..112.42 rows=407 width=32) (actual time=0.048..3.487 > rows=400 loops=1) > -> Nested Loop (cost=0.00..47.51 rows=197 width=32) (actual > time=0.045..1.061 rows=200 loops=1) > -> Seq Scan on t1 (cost=0.00..1.06 rows=1 width=16) (actual > time=0.011..0.014 rows=1 loops=1) > Filter: (t = 2) > -> Index Scan using t_idx on t2 (cost=0.00..44.48 rows=197 > width=16) (actual time=0.026..0.382 rows=200 loops=1) > Index Cond: (pg_temp_2.t2.t = 2) > -> Nested Loop (cost=0.00..32.67 rows=117 width=32) (actual > time=0.019..0.599 rows=100 loops=1) > -> Seq Scan on t1 (cost=0.00..1.06 rows=1 width=16) (actual > time=0.003..0.006 rows=1 loops=1) > Filter: (t = 3) > -> Index Scan using t_idx on t2 (cost=0.00..30.43 rows=117 > width=16) (actual time=0.010..0.211 rows=100 loops=1) > Index Cond: (pg_temp_2.t2.t = 3) > -> Nested Loop (cost=0.00..28.17 rows=93 width=32) (actual > time=0.017..0.534 rows=100 loops=1) > -> Seq Scan on t1 (cost=0.00..1.06 rows=1 width=16) (actual > time=0.005..0.008 rows=1 loops=1) > Filter: (t = 4) > -> Index Scan using t_idx on t2 (cost=0.00..26.18 rows=93 > width=16) (actual time=0.007..0.187 rows=100 loops=1) > Index Cond: (pg_temp_2.t2.t = 4) > Total runtime: 4.190 ms > (17 rows) > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance