I don't have an answer for you, but this report looks suspiciously similar to the one I posted the other day at <http://archives.postgresql.org/pgsql-hackers/2011-08/msg01224.php>, which, now that I think about it, also manifested itself after the upgrade to 8.4.8. On tis, 2011-08-30 at 15:24 -0700, Ben Chobot wrote: > We recently took a copy of our production data (running on 8.4.2), > scrubbed many data fields, and then loaded it onto a qa server > (running 8.4.8). We're seeing some odd planner performance that I > think might be a bug, though I'm hoping it's just idiocy on my part. > I've analyzed things and looked into pg_stats and it seems as if the > relevant columns have about the same statistics. > > > I've managed to simplify the query, but if I make it any simpler, then > the two servers end up with the same good plan. The query is down to: > > > SELECT machines.quota_purchased > FROM machines > WHERE NOT deleted AND machines.user_id IN ( > SELECT id FROM users WHERE user_group_id IN ( > select 607547 offset 0 > ) OFFSET 0 > ); > > > > > (Those "offset 0" are in there to protect us from planner regressions > we saw when moving to 8.4. When we move to 9, they can hopefully go > away.) > > > On the production server, this returns a fairly accurate plan: > > > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=843.59..1447.90 rows=243 width=8) (actual > time=0.044..0.045 rows=1 loops=1) > -> HashAggregate (cost=843.59..845.59 rows=200 width=4) (actual > time=0.027..0.027 rows=1 loops=1) > -> Limit (cost=0.02..823.90 rows=1575 width=4) (actual > time=0.024..0.025 rows=1 loops=1) > -> Nested Loop (cost=0.02..823.90 rows=1575 width=4) > (actual time=0.023..0.024 rows=1 loops=1) > -> HashAggregate (cost=0.02..0.03 rows=1 > width=4) (actual time=0.005..0.005 rows=1 loops=1) > -> Limit (cost=0.00..0.01 rows=1 width=0) > (actual time=0.001..0.002 rows=1 loops=1) > -> Result (cost=0.00..0.01 rows=1 > width=0) (actual time=0.000..0.000 rows=1 loops=1) > -> Index Scan using users_user_groups_idx on > users (cost=0.00..804.18 rows=1575 width=8) (actual time=0.017..0.018 > rows=1 loops=1) > Index Cond: (users.user_group_id = > (607547)) > -> Index Scan using machines_sid_un on machines (cost=0.00..3.00 > rows=1 width=12) (actual time=0.015..0.015 rows=1 loops=1) > Index Cond: (machines.user_id = users.id) > Total runtime: 0.121 ms > (12 rows) > > > > > On the QA server, things are not so accurate. It doesn't hurt the > timing of this simplified query much, but when put into the actual > query, the row estimation being off by 6 orders of magnitude really > throws the planning in the wrong direction. The plan on the QA server > is: > > > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=1887.16..3671.20 rows=1192462 width=8) (actual > time=0.049..0.051 rows=1 loops=1) > -> HashAggregate (cost=1887.16..1889.16 rows=200 width=4) (actual > time=0.032..0.033 rows=1 loops=1) > -> Limit (cost=0.02..1868.20 rows=1517 width=4) (actual > time=0.027..0.029 rows=1 loops=1) > -> Nested Loop (cost=0.02..1868.20 rows=1517 width=4) > (actual time=0.027..0.028 rows=1 loops=1) > -> HashAggregate (cost=0.02..0.03 rows=1 > width=4) (actual time=0.008..0.008 rows=1 loops=1) > -> Limit (cost=0.00..0.01 rows=1 width=0) > (actual time=0.001..0.001 rows=1 loops=1) > -> Result (cost=0.00..0.01 rows=1 > width=0) (actual time=0.001..0.001 rows=1 loops=1) > -> Index Scan using users_user_groups_idx on > users (cost=0.00..1849.20 rows=1517 width=8) (actual > time=0.015..0.016 rows=1 loops=1) > Index Cond: (users.user_group_id = > (607547)) > -> Index Scan using machines_sid_un on machines (cost=0.00..8.90 > rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1) > Index Cond: (machines.user_id = users.id) > Total runtime: 0.148 ms > (12 rows) > > > > > > > The problem here (I think) seems to be that the QA server believes > that running a nested loop over 200 users.id values and joining that > against machines.user_id will result in >1M rows. The production > servers sees this more accurately as the nearly 1:1 relationship that > it is. > > > The reason I wonder if this might be a bug is because if I change the > obtuse clause "WHERE user_group_id IN (select 607547 offset 0)" to > simply "where user_group_id in (607547)" then the plan collapses to > the same plan on both servers: > > > explain analyze SELECT machines.quota_purchased > FROM machines > WHERE NOT deleted AND machines.user_id IN ( > SELECT id FROM users WHERE user_group_id IN ( > 607547 > ) OFFSET 0 > ); > > > > > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=82.27..636.70 rows=62 width=8) (actual > time=0.042..0.043 rows=1 loops=1) > -> HashAggregate (cost=82.27..82.88 rows=61 width=4) (actual > time=0.024..0.025 rows=1 loops=1) > -> Limit (cost=0.00..81.51 rows=61 width=4) (actual > time=0.017..0.018 rows=1 loops=1) > -> Index Scan using users_user_groups_idx on users > (cost=0.00..81.51 rows=61 width=4) (actual time=0.015..0.016 rows=1 > loops=1) > Index Cond: (user_group_id = 607547) > -> Index Scan using machines_sid_un on machines (cost=0.00..9.07 > rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1) > Index Cond: (machines.user_id = users.id) > Total runtime: 0.106 ms > (8 rows) > > > > > But, as I understand it, that remaining OFFSET 0 should keep the > planner from caring that it's a single value or the result of a > subselect. > > > > > Anyway, anything I can check? The hardware is different so the > configuration files are different, but they're not that different in > anything other than effective_cache_size. Certainly not different in > any way I can think that would affect this kind of planning > mis-estimation. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general