I get it now... project_id=115 has a frequency of 0.09241 in pg_stats. So if ((project_id = 115) AND (project_id = 115)) is considered as two independent conditions, the row estimate ends up being 0.09241 * 0.09241 * 1.20163e+07 (reltuples from pg_class) = 102614. http://www.postgresql.org/docs/9.0/static/row-estimation-examples.html was a big help. Matt On Wed, Feb 27, 2013 at 9:08 AM, Matt Daw <matt@xxxxxxxxxxxxxxxxxxx> wrote: > Quick follow up... I've found that the row estimate in: > > explain select count(id) from versions where project_id IN (80,115) > AND project_id=115; > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > Aggregate (cost=178572.75..178572.76 rows=1 width=4) > -> Index Scan using dneg_versions_project_id on versions > (cost=0.00..178306.94 rows=106323 width=4) > Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND > (project_id = 115)) > > > ... is the sum of two other estimates, seen when rewriting the query > using OR instead of IN: > > > explain select count(id) from versions where (project_id = 80 OR > project_id = 115) AND project_id=115; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=305896.95..305896.96 rows=1 width=4) > -> Bitmap Heap Scan on versions (cost=2315.08..305632.00 > rows=105980 width=4) > Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR > ((project_id = 115) AND (project_id = 115))) > -> BitmapOr (cost=2315.08..2315.08 rows=106323 width=0) > -> Bitmap Index Scan on dneg_versions_project_id > (cost=0.00..94.52 rows=3709 width=0) > Index Cond: ((project_id = 80) AND (project_id = 115)) > -> Bitmap Index Scan on dneg_versions_project_id > (cost=0.00..2167.57 rows=102614 width=0) > Index Cond: ((project_id = 115) AND (project_id = 115)) > > 106323 = 3709 + 102614 > > Looks like the underlying problem is that the estimate for > ((project_id = 115) AND (project_id = 115)) doesn't end up being the > same as (project_id=115) on its own. > > Matt > > On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw <matt@xxxxxxxxxxxxxxxxxxx> wrote: >> Howdy, the query generator in my app sometimes creates redundant >> filters of the form: >> >> project_id IN ( <list of projects user has permission to see> ) AND >> project_id = <single project user is looking at > >> >> ... and this is leading to a bad estimate (and thus a bad plan) on a >> few complex queries. I've included simplified examples below. This >> server is running 9.0.10 and the statistics target has been updated to >> 1000 on the project_id column. I've also loaded the one table into a >> 9.2.2 instance and replicated the behaviour. >> >> I can change how the query is being generated, but I'm curious why I'm >> getting a bad estimate. Is this an expected result? >> >> Thanks! >> >> Matt >> >> ============= >> >> 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. >> >> explain (analyze,buffers) select count(id) from versions WHERE project_id=115; >> >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual >> time=1531.341..1531.342 rows=1 loops=1) >> Buffers: shared hit=452619 >> -> Bitmap Heap Scan on versions (cost=34245.06..1215254.86 >> rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 >> loops=1) >> Recheck Cond: (project_id = 115) >> Buffers: shared hit=452619 >> -> Bitmap Index Scan on versions_project_id >> (cost=0.00..33959.45 rows=1142461 width=0) (actual >> time=139.709..139.709 rows=1116037 loops=1) >> Index Cond: (project_id = 115) >> Buffers: shared hit=22077 >> Total runtime: 1531.399 ms >> >> 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. >> >> explain (analyze,buffers) select count(id) from versions WHERE >> project_id IN (80,115) AND project_id=115;; >> >> QUERY PLAN >> --------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual >> time=1637.889..1637.889 rows=1 loops=1) >> Buffers: shared hit=458389 >> -> Bitmap Heap Scan on versions (cost=3546.56..326793.17 >> rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 >> loops=1) >> Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND >> (project_id = 115)) >> Buffers: shared hit=458389 >> -> Bitmap Index Scan on versions_project_id >> (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 >> rows=1125436 loops=1) >> Index Cond: ((project_id = ANY ('{80,115}'::integer[])) >> AND (project_id = 115)) >> Buffers: shared hit=22076 >> Total runtime: 1637.941 ms On Wed, Feb 27, 2013 at 9:08 AM, Matt Daw <matt@xxxxxxxxxxxxxxxxxxx> wrote: > Quick follow up... I've found that the row estimate in: > > explain select count(id) from versions where project_id IN (80,115) > AND project_id=115; > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > Aggregate (cost=178572.75..178572.76 rows=1 width=4) > -> Index Scan using dneg_versions_project_id on versions > (cost=0.00..178306.94 rows=106323 width=4) > Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND > (project_id = 115)) > > > ... is the sum of two other estimates, seen when rewriting the query > using OR instead of IN: > > > explain select count(id) from versions where (project_id = 80 OR > project_id = 115) AND project_id=115; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=305896.95..305896.96 rows=1 width=4) > -> Bitmap Heap Scan on versions (cost=2315.08..305632.00 > rows=105980 width=4) > Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR > ((project_id = 115) AND (project_id = 115))) > -> BitmapOr (cost=2315.08..2315.08 rows=106323 width=0) > -> Bitmap Index Scan on dneg_versions_project_id > (cost=0.00..94.52 rows=3709 width=0) > Index Cond: ((project_id = 80) AND (project_id = 115)) > -> Bitmap Index Scan on dneg_versions_project_id > (cost=0.00..2167.57 rows=102614 width=0) > Index Cond: ((project_id = 115) AND (project_id = 115)) > > 106323 = 3709 + 102614 > > Looks like the underlying problem is that the estimate for > ((project_id = 115) AND (project_id = 115)) doesn't end up being the > same as (project_id=115) on its own. > > Matt > > On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw <matt@xxxxxxxxxxxxxxxxxxx> wrote: >> Howdy, the query generator in my app sometimes creates redundant >> filters of the form: >> >> project_id IN ( <list of projects user has permission to see> ) AND >> project_id = <single project user is looking at > >> >> ... and this is leading to a bad estimate (and thus a bad plan) on a >> few complex queries. I've included simplified examples below. This >> server is running 9.0.10 and the statistics target has been updated to >> 1000 on the project_id column. I've also loaded the one table into a >> 9.2.2 instance and replicated the behaviour. >> >> I can change how the query is being generated, but I'm curious why I'm >> getting a bad estimate. Is this an expected result? >> >> Thanks! >> >> Matt >> >> ============= >> >> 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. >> >> explain (analyze,buffers) select count(id) from versions WHERE project_id=115; >> >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual >> time=1531.341..1531.342 rows=1 loops=1) >> Buffers: shared hit=452619 >> -> Bitmap Heap Scan on versions (cost=34245.06..1215254.86 >> rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 >> loops=1) >> Recheck Cond: (project_id = 115) >> Buffers: shared hit=452619 >> -> Bitmap Index Scan on versions_project_id >> (cost=0.00..33959.45 rows=1142461 width=0) (actual >> time=139.709..139.709 rows=1116037 loops=1) >> Index Cond: (project_id = 115) >> Buffers: shared hit=22077 >> Total runtime: 1531.399 ms >> >> 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. >> >> explain (analyze,buffers) select count(id) from versions WHERE >> project_id IN (80,115) AND project_id=115;; >> >> QUERY PLAN >> --------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual >> time=1637.889..1637.889 rows=1 loops=1) >> Buffers: shared hit=458389 >> -> Bitmap Heap Scan on versions (cost=3546.56..326793.17 >> rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 >> loops=1) >> Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND >> (project_id = 115)) >> Buffers: shared hit=458389 >> -> Bitmap Index Scan on versions_project_id >> (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 >> rows=1125436 loops=1) >> Index Cond: ((project_id = ANY ('{80,115}'::integer[])) >> AND (project_id = 115)) >> Buffers: shared hit=22076 >> Total runtime: 1637.941 ms On Wed, Feb 27, 2013 at 9:08 AM, Matt Daw <matt@xxxxxxxxxxxxxxxxxxx> wrote: > Quick follow up... I've found that the row estimate in: > > explain select count(id) from versions where project_id IN (80,115) > AND project_id=115; > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > Aggregate (cost=178572.75..178572.76 rows=1 width=4) > -> Index Scan using dneg_versions_project_id on versions > (cost=0.00..178306.94 rows=106323 width=4) > Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND > (project_id = 115)) > > > ... is the sum of two other estimates, seen when rewriting the query > using OR instead of IN: > > > explain select count(id) from versions where (project_id = 80 OR > project_id = 115) AND project_id=115; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=305896.95..305896.96 rows=1 width=4) > -> Bitmap Heap Scan on versions (cost=2315.08..305632.00 > rows=105980 width=4) > Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR > ((project_id = 115) AND (project_id = 115))) > -> BitmapOr (cost=2315.08..2315.08 rows=106323 width=0) > -> Bitmap Index Scan on dneg_versions_project_id > (cost=0.00..94.52 rows=3709 width=0) > Index Cond: ((project_id = 80) AND (project_id = 115)) > -> Bitmap Index Scan on dneg_versions_project_id > (cost=0.00..2167.57 rows=102614 width=0) > Index Cond: ((project_id = 115) AND (project_id = 115)) > > 106323 = 3709 + 102614 > > Looks like the underlying problem is that the estimate for > ((project_id = 115) AND (project_id = 115)) doesn't end up being the > same as (project_id=115) on its own. > > Matt > > On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw <matt@xxxxxxxxxxxxxxxxxxx> wrote: >> Howdy, the query generator in my app sometimes creates redundant >> filters of the form: >> >> project_id IN ( <list of projects user has permission to see> ) AND >> project_id = <single project user is looking at > >> >> ... and this is leading to a bad estimate (and thus a bad plan) on a >> few complex queries. I've included simplified examples below. This >> server is running 9.0.10 and the statistics target has been updated to >> 1000 on the project_id column. I've also loaded the one table into a >> 9.2.2 instance and replicated the behaviour. >> >> I can change how the query is being generated, but I'm curious why I'm >> getting a bad estimate. Is this an expected result? >> >> Thanks! >> >> Matt >> >> ============= >> >> 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. >> >> explain (analyze,buffers) select count(id) from versions WHERE project_id=115; >> >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual >> time=1531.341..1531.342 rows=1 loops=1) >> Buffers: shared hit=452619 >> -> Bitmap Heap Scan on versions (cost=34245.06..1215254.86 >> rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 >> loops=1) >> Recheck Cond: (project_id = 115) >> Buffers: shared hit=452619 >> -> Bitmap Index Scan on versions_project_id >> (cost=0.00..33959.45 rows=1142461 width=0) (actual >> time=139.709..139.709 rows=1116037 loops=1) >> Index Cond: (project_id = 115) >> Buffers: shared hit=22077 >> Total runtime: 1531.399 ms >> >> 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. >> >> explain (analyze,buffers) select count(id) from versions WHERE >> project_id IN (80,115) AND project_id=115;; >> >> QUERY PLAN >> --------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual >> time=1637.889..1637.889 rows=1 loops=1) >> Buffers: shared hit=458389 >> -> Bitmap Heap Scan on versions (cost=3546.56..326793.17 >> rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 >> loops=1) >> Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND >> (project_id = 115)) >> Buffers: shared hit=458389 >> -> Bitmap Index Scan on versions_project_id >> (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 >> rows=1125436 loops=1) >> Index Cond: ((project_id = ANY ('{80,115}'::integer[])) >> AND (project_id = 115)) >> Buffers: shared hit=22076 >> Total runtime: 1637.941 ms -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance