On Mon, Mar 14, 2011 at 10:54 AM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote: > Nothing? > > No ideas? > > Did I forget to include some useful bit? > > On Fri, Mar 4, 2011 at 8:22 PM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote: >> Hello, first post to this list. >> >> I have this query that ran in milliseconds in postgres 8.3.7 (usually 50, >> 100ms), and now it takes a full 10 minutes to complete. >> >> I tracked the problem to the usage of hash aggregates to resolve EXISTS >> clauses in conjunction with large IN clauses, which seem to reverse the >> execution plan - in 8.3.7, it would use indices to fetch the rows from the >> IN, then compute the exists with a nested loop, never doing big sequential >> scans. In 9.0.3, it computes the set of applicable entries with a hash >> aggregate, but in order to do that it performs a huge index scan - no >> sequential scans either, but the big index scan is worse. >> >> 9.0.3 always misses the estimate of how many rows will come out the hash >> aggregate, always estimating 200, while in fact the real count is more like >> 300.000. I've tried increasing statistics in all the columns involved, up to >> 4000 for each, to the point where it accurately estimates the input to the >> hash agg, but the output is always estimated to be 200 rows. >> >> Rewriting the query to use 0 < (select count(*)..) instead of EXISTS (select >> * ..) does revert to the old postgres 8.3 plan, although intuitively I would >> think it to be sub-optimal. >> >> The tables in question receive many updates, but never in such a volume as >> to create enough bloat - plus, the tests I've been running are on a >> pre-production server without much traffic (so not many updates - probably >> none in weeks). >> >> The server is a Core 2 E7400 dual core with 4GB of ram running linux and a >> pg 9.0.3 / 8.3.7 (both there, doing migration testing) built from source. >> Quite smaller than our production server, but I've tested the issue on >> higher-end hardware and it produces the same results. >> >> Any ideas as to how to work around this issue? >> >> I can't plug the select count() version everywhere, since I won't be using >> this form of the query every time (it's generated programatically with an >> ORM), and some forms of it perform incredibly worse with the select count(). >> >> Also, any help I can provide to fix it upstream I'll be glad to - I believe >> (I would have to check) I can even create a dump of the tables (stripping >> sensitive info of course) - only, well, you'll see the size below - a tad >> big to be mailing it ;-) >> >> pg 9.0 is configured with: >> >> work_mem = 64M >> shared_buffers = 512M >> temp_buffers = 64M >> effective_cache_size = 128M >> >> pg 8.3.7 is configured with: >> >> work_mem = 64M >> shared_buffers = 100M >> temp_buffers = 64M >> effective_cache_size = 128M >> >> >> The query in question: >> >>> SELECT member_statistics.member_id >>> FROM member_statistics >>> WHERE member_statistics.member_id IN ( <<400 ids>> ) AND (EXISTS >>> (SELECT mat1.tag_id >>> FROM member_all_tags_v AS mat1 >>> WHERE mat1.member_id = member_statistics.member_id AND mat1.tag_id >>> IN (640, 641, 3637, 3638, 637, 638, 639) AND mat1.polarity >= 90)) hm the regression in and of itself is interesting, but I wonder if you can get past your issue like this: SELECT member_statistics.member_id FROM member_statistics WHERE member_statistics.member_id IN ( <<400 ids>> ) AND (EXISTS (SELECT mat1.tag_id FROM member_all_tags_v AS mat1 WHERE mat1.member_id = member_statistics.member_id AND mat1.tag_id IN (640, 641, 3637, 3638, 637, 638, 639) AND mat1.polarity >= 90)) changes to: SELECT member_statistics.member_id FROM member_statistics WHERE EXISTS ( SELECT mat1.tag_id FROM member_all_tags_v AS mat1 WHERE mat1.member_id = member_statistics.member_id AND mat1.tag_id IN (640, 641, 3637, 3638, 637, 638, 639) AND mat1.polarity >= 90 AND mat1.member_id IN ( <<400 ids>> ) ) also, always try to compare vs straight join version: SELECT member_statistics.member_id FROM member_statistics JOIN VALUES ( <<400 ids>> ) q(member_id) using (member_id) JOIN ( SELECT mat1.member_id FROM member_all_tags_v AS mat1 WHERE mat1.tag_id IN (640, 641, 3637, 3638, 637, 638, 639) AND mat1.polarity >= 90) p USING(member_id) ) p using(member_id); merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance