OK. Well, fortunately for us, we have a lot of possible solutions this problem, and it sounds like actually getting statistics for attributes ? 'reference' is not realistic. I just wanted to make sure it wasn't some configuration error on our part. Can anyone explain where exactly the estimate for that clause comes from? I tried adding an index and I don't think it improved the estimation, the planner still thinks there will be 9k rows as a result of type_id = 23 and attributes ? 'reference'. [1]. It might make the pathological plan less likely though. It's not clear to me that it reduces the risk of a pathological plan to zero. I also tried wrapping it in a subquery [2]. The estimate is, of course, still awful, but it doesn't matter anymore because it can't pick a plan that leverages its low estimate. Its only choice is a simple filter on the results. [1] # CREATE INDEX foobarbaz ON component((attributes -> 'reference')) WHERE ( attributes ? 'reference' ); CREATE INDEX judicata=# explain (analyze, buffers) declare "foo_cursor" cursor for SELECT ref.case_id, array_agg(ref.attributes -> 'reference') FROM component ref JOIN document c ON c.id = ref.case_id WHERE ref.type_id = 23 AND ref.attributes ? 'reference' AND NOT 0 = ANY(c.types) GROUP BY ref.case_id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=419667.86..419667.99 rows=10 width=34) (actual time=97074.773..97197.487 rows=90969 loops=1) Buffers: shared hit=16954389 read=4533956 dirtied=2963 written=4759 -> Nested Loop (cost=5472.44..419628.76 rows=5213 width=34) (actual time=537.202..94710.844 rows=2488142 loops=1) Buffers: shared hit=16954389 read=4533956 dirtied=2963 written=4759 -> Bitmap Heap Scan on component ref (cost=5468.01..342716.88 rows=9087 width=34) (actual time=534.862..49617.945 rows=4624280 loops=1) Recheck Cond: (attributes ? 'reference'::text) Rows Removed by Index Recheck: 28739170 Filter: (type_id = 23) Rows Removed by Filter: 165268 Buffers: shared hit=25 read=921758 dirtied=2963 written=906 -> Bitmap Index Scan on foobarbaz (cost=0.00..5465.74 rows=98636 width=0) (actual time=532.215..532.215 rows=4789548 loops=1) Buffers: shared read=59300 written=57 -> Bitmap Heap Scan on document c (cost=4.43..8.45 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=4624280) Recheck Cond: (id = ref.case_id) Filter: (0 <> ALL (types)) Rows Removed by Filter: 0 Buffers: shared hit=16954364 read=3612198 written=3853 -> Bitmap Index Scan on document_pkey (cost=0.00..4.43 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=4624280) Index Cond: (id = ref.case_id) Buffers: shared hit=14082540 read=1859742 written=1974 Total runtime: 97217.718 ms [2] # explain (analyze, buffers) declare "foo_cursor" cursor for SELECT * FROM (SELECT ref.case_id as case_id, array_agg(ref.attributes -> 'reference') as reference FROM component ref JOIN document c ON c.id = ref.case_id WHERE ref.type_id = 23 AND NOT 0 = ANY(c.types) GROUP BY ref.case_id) as t WHERE reference IS NOT NULL; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ GroupAggregate (cost=5636347.52..12524155.45 rows=9817 width=34) (actual time=165466.502..195035.433 rows=93580 loops=1) Filter: (array_agg((ref.attributes -> 'reference'::text)) IS NOT NULL) Buffers: shared hit=13884 read=2085572 written=2952, temp read=902337 written=902337 -> Merge Join (cost=5636347.52..12458841.11 rows=5213367 width=34) (actual time=165383.814..193813.490 rows=5115136 loops=1) Merge Cond: (c.id = ref.case_id) Buffers: shared hit=13884 read=2085572 written=2952, temp read=902337 written=902337 -> Index Scan using document_pkey on document c (cost=0.43..6696889.20 rows=2128590 width=4) (actual time=0.009..24720.726 rows=94634 loops=1) Filter: (0 <> ALL (types)) Rows Removed by Filter: 70829 Buffers: shared hit=13852 read=195821 -> Materialize (cost=5636345.76..5681782.42 rows=9087332 width=34) (actual time=165383.798..168027.149 rows=9120904 loops=1) Buffers: shared hit=32 read=1889751 written=2952, temp read=902337 written=902337 -> Sort (cost=5636345.76..5659064.09 rows=9087332 width=34) (actual time=165383.793..167173.325 rows=9120904 loops=1) Sort Key: ref.case_id Sort Method: external merge Disk: 1392648kB Buffers: shared hit=32 read=1889751 written=2952, temp read=902337 written=902337 -> Bitmap Heap Scan on component ref (cost=481859.39..3592128.04 rows=9087332 width=34) (actual time=20950.899..145515.599 rows=9120904 loops=1) Recheck Cond: (type_id = 23) Rows Removed by Index Recheck: 57286889 Buffers: shared hit=32 read=1889751 written=2952 -> Bitmap Index Scan on component_type_id (cost=0.00..479587.56 rows=9087332 width=0) (actual time=20947.739..20947.739 rows=12143019 loops=1) Index Cond: (type_id = 23) Buffers: shared read=164918 written=2816 Total runtime: 195213.232 ms On Wed, Jun 10, 2015 at 1:01 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: > On Wed, Jun 10, 2015 at 2:40 PM, Josh Berkus <josh@xxxxxxxxxxxx> wrote: >> On 06/10/2015 11:32 AM, Merlin Moncure wrote: >>> This is a fundamental issue with using 'database in a box' datatypes >>> like hstore and jsonb. They are opaque to the statistics gathering >>> system and so are unable to give reasonable estimates beyond broad >>> assumptions. Speaking generally, the workarounds are too: >>> >>> *) disable particular plan choices for this query >>> (materialize/nestloop are common culprits) >>> >>> *) create btree indexes around specific extraction clauses >>> >>> *) refactor some of the query into set returning function with a >>> custom ROWS clause >>> >>> *) try alternate indexing strategy such as jsonb/jsquery >>> >>> *) move out of hstore and into more standard relational strucure >> >> You forgot: >> >> *) Fund a PostgreSQL developer to add selectivity estimation and stats >> to hstore. > > Well, I don't know. That's really complex to the point of making me > wonder if it's worth doing even given infinite time and resources. If > it was my money, I'd be researching a clean way to inject estimate > returning expressions into the query that the planner could utilize. > Not 'hints' which are really about managing the output of the planner, > just what feeds in. Also lots of various solutions of alcohol to > lubricate the attendant -hackers discussions. > > merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance