I want to do a JOIN against a subquery that is doing an aggregation. The query itself is relatively straightforward, but has poor performance.
LINE 1: ...ue), COUNT(DISTINCT value) FROM other WHERE other=a.other GR...
^
HINT: There is an entry for table "a", but it cannot be referenced from this part of the query.
In case it's helpful, here's the table definitions:
CREATE TABLE base (id INTEGER PRIMARY KEY, value TEXT, other INTEGER);
CREATE TABLE other (other INTEGER, value INTEGER);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=27619.21..27741.23 rows=3 width=33) (actual time=105.045..115.539 rows=3 loops=1)
Merge Cond: (other.other = a.other)
-> GroupAggregate (cost=27602.28..27711.74 rows=1001 width=20) (actual time=104.989..115.452 rows=3 loops=1)
Group Key: other.other
InitPlan 1 (returns $0)
-> Unique (cost=16.93..16.95 rows=3 width=4) (actual time=0.083..0.127 rows=3 loops=1)
-> Sort (cost=16.93..16.94 rows=3 width=4) (actual time=0.073..0.085 rows=3 loops=1)
Sort Key: base.other
Sort Method: quicksort Memory: 25kB
-> Index Scan using base_pkey on base (cost=0.29..16.91 rows=3 width=4) (actual time=0.019..0.042 rows=3 loops=1)
Index Cond: (id = ANY ('{4,56,102}'::integer[]))
-> Sort (cost=27585.34..27610.20 rows=9945 width=8) (actual time=99.401..107.199 rows=3035 loops=1)
Sort Key: other.other
Sort Method: quicksort Memory: 239kB
-> Seq Scan on other (cost=0.00..26925.00 rows=9945 width=8) (actual time=0.708..90.738 rows=3035 loops=1)
Filter: (other = ANY ($0))
Rows Removed by Filter: 996965
-> Sort (cost=16.93..16.94 rows=3 width=13) (actual time=0.044..0.051 rows=3 loops=1)
Sort Key: a.other
Sort Method: quicksort Memory: 25kB
-> Index Scan using base_pkey on base a (cost=0.29..16.91 rows=3 width=13) (actual time=0.016..0.027 rows=3 loops=1)
Index Cond: (id = ANY ('{4,56,102}'::integer[]))
Planning time: 4.163 ms
Execution time: 115.665 ms
EXPLAIN ANALYZE SELECT a.*, b.* FROM base AS a LEFT OUTER JOIN (SELECT other, COUNT(value), COUNT(DISTINCT value) FROM other GROUP BY other) AS b USING (other) WHERE id IN (4, 56, 102);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=127786.02..137791.07 rows=3 width=60) (actual time=7459.042..12060.805 rows=3 loops=1)
Merge Cond: (other.other = a.other)
-> GroupAggregate (cost=127763.19..137765.69 rows=200 width=20) (actual time=7143.486..12057.835 rows=830 loops=1)
Group Key: other.other
-> Sort (cost=127763.19..130263.31 rows=1000050 width=8) (actual time=7137.594..9624.119 rows=829088 loops=1)
Sort Key: other.other
Sort Method: external merge Disk: 17576kB
-> Seq Scan on other (cost=0.00..14425.50 rows=1000050 width=8) (actual time=0.555..2727.461 rows=1000000 loops=1)
-> Sort (cost=22.83..22.84 rows=3 width=40) (actual time=0.103..0.112 rows=3 loops=1)
Sort Key: a.other
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on base a (cost=12.87..22.81 rows=3 width=40) (actual time=0.048..0.064 rows=3 loops=1)
Recheck Cond: (id = ANY ('{4,56,102}'::integer[]))
Heap Blocks: exact=1
-> Bitmap Index Scan on base_pkey (cost=0.00..12.87 rows=3 width=0) (actual time=0.029..0.029 rows=3 loops=1)
Index Cond: (id = ANY ('{4,56,102}'::integer[]))
Planning time: 2.179 ms
Execution time: 12080.172 ms
Here it is:
SELECT a.*, b.* FROM base AS a
LEFT OUTER JOIN
(SELECT other, COUNT(value), COUNT(DISTINCT value) FROM other GROUP BY other) AS b
USING (other)
WHERE id IN (4, 56, 102);
It's significantly faster, but more complicated (and repetitive), if I add the following:
WHERE other = ANY(ARRAY(SELECT DISTINCT other FROM base WHERE id IN (4, 56, 102)))I tried adding the following:
other IN (a.other)Or:
other = a.other
But I get this error:
ERROR: invalid reference to FROM-clause entry for table "a"LINE 1: ...ue), COUNT(DISTINCT value) FROM other WHERE other=a.other GR...
^
HINT: There is an entry for table "a", but it cannot be referenced from this part of the query.
Is there a way to do something like that simpler query so the subquery can get better performance by filtering only to what it needs instead of doing the GROUP BY on the whole table?
Thanks,
Dave
CREATE TABLE base (id INTEGER PRIMARY KEY, value TEXT, other INTEGER);
CREATE TABLE other (other INTEGER, value INTEGER);
And the explain results:
EXPLAIN ANALYZE SELECT a.*, b.* FROM base AS a LEFT OUTER JOIN (SELECT other, COUNT(value), COUNT(DISTINCT value) FROM other WHERE other = ANY(ARRAY(SELECT DISTINCT other FROM base WHERE id IN (4, 56, 102))) GROUP BY other) AS b USING (other) WHERE id IN (4, 56, 102);QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=27619.21..27741.23 rows=3 width=33) (actual time=105.045..115.539 rows=3 loops=1)
Merge Cond: (other.other = a.other)
-> GroupAggregate (cost=27602.28..27711.74 rows=1001 width=20) (actual time=104.989..115.452 rows=3 loops=1)
Group Key: other.other
InitPlan 1 (returns $0)
-> Unique (cost=16.93..16.95 rows=3 width=4) (actual time=0.083..0.127 rows=3 loops=1)
-> Sort (cost=16.93..16.94 rows=3 width=4) (actual time=0.073..0.085 rows=3 loops=1)
Sort Key: base.other
Sort Method: quicksort Memory: 25kB
-> Index Scan using base_pkey on base (cost=0.29..16.91 rows=3 width=4) (actual time=0.019..0.042 rows=3 loops=1)
Index Cond: (id = ANY ('{4,56,102}'::integer[]))
-> Sort (cost=27585.34..27610.20 rows=9945 width=8) (actual time=99.401..107.199 rows=3035 loops=1)
Sort Key: other.other
Sort Method: quicksort Memory: 239kB
-> Seq Scan on other (cost=0.00..26925.00 rows=9945 width=8) (actual time=0.708..90.738 rows=3035 loops=1)
Filter: (other = ANY ($0))
Rows Removed by Filter: 996965
-> Sort (cost=16.93..16.94 rows=3 width=13) (actual time=0.044..0.051 rows=3 loops=1)
Sort Key: a.other
Sort Method: quicksort Memory: 25kB
-> Index Scan using base_pkey on base a (cost=0.29..16.91 rows=3 width=13) (actual time=0.016..0.027 rows=3 loops=1)
Index Cond: (id = ANY ('{4,56,102}'::integer[]))
Planning time: 4.163 ms
Execution time: 115.665 ms
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=127786.02..137791.07 rows=3 width=60) (actual time=7459.042..12060.805 rows=3 loops=1)
Merge Cond: (other.other = a.other)
-> GroupAggregate (cost=127763.19..137765.69 rows=200 width=20) (actual time=7143.486..12057.835 rows=830 loops=1)
Group Key: other.other
-> Sort (cost=127763.19..130263.31 rows=1000050 width=8) (actual time=7137.594..9624.119 rows=829088 loops=1)
Sort Key: other.other
Sort Method: external merge Disk: 17576kB
-> Seq Scan on other (cost=0.00..14425.50 rows=1000050 width=8) (actual time=0.555..2727.461 rows=1000000 loops=1)
-> Sort (cost=22.83..22.84 rows=3 width=40) (actual time=0.103..0.112 rows=3 loops=1)
Sort Key: a.other
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on base a (cost=12.87..22.81 rows=3 width=40) (actual time=0.048..0.064 rows=3 loops=1)
Recheck Cond: (id = ANY ('{4,56,102}'::integer[]))
Heap Blocks: exact=1
-> Bitmap Index Scan on base_pkey (cost=0.00..12.87 rows=3 width=0) (actual time=0.029..0.029 rows=3 loops=1)
Index Cond: (id = ANY ('{4,56,102}'::integer[]))
Planning time: 2.179 ms
Execution time: 12080.172 ms