I have created a GIN index using jsonb_path_ops over some JSONB columns. Initially, while the index was small, the query planner would select a Bitmap Index Scan strategy to execute queries leveraging the appropriate JSONB operator (@>). Now that the table has grown to almost 200k rows and the index has grown to 1.6M rows, the query planner prefers running Seq Scans. db=> SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname IN ('_object', 'idx_object') and relnamespace = 43315; relname | relkind | reltuples | relpages ------------+---------+------------+---------- _object | r | 185618 | 39030 idx_object | i | 1.6583e+06 | 512 (2 rows) db=> explain analyze ...; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Subquery Scan on x (cost=41814.28..41814.30 rows=1 width=25) (actual time=3954.742..3954.820 rows=5 loops=1) Filter: ((x._data IS NOT NULL) OR (x.metadata IS NOT NULL)) Rows Removed by Filter: 3 -> Unique (cost=41814.28..41814.29 rows=1 width=1562) (actual time=3954.740..3954.814 rows=8 loops=1) -> Sort (cost=41814.28..41814.28 rows=1 width=1562) (actual time=3954.738..3954.754 rows=77 loops=1) Sort Key: ... Sort Method: quicksort Memory: 63kB -> Seq Scan on _object (cost=0.00..41814.27 rows=1 width=1562) (actual time=84.980..3954.330 rows=77 loops=1) Filter: ((... @> ...::jsonb) AND (... @> ...::jsonb)) Rows Removed by Filter: 185529 Planning time: 0.261 ms Execution time: 3954.860 ms (12 rows) Disabling seq scans shows that the execution time of the Bitmap Index Scan is considerably lower, but the cost estimate is completely off. db=> explain analyze ....; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan on x (cost=75584.03..75584.05 rows=1 width=25) (actual time=24.591..24.664 rows=5 loops=1) Filter: ((x._data IS NOT NULL) OR (x.metadata IS NOT NULL)) Rows Removed by Filter: 3 -> Unique (cost=75584.03..75584.04 rows=1 width=1562) (actual time=24.589..24.659 rows=8 loops=1) -> Sort (cost=75584.03..75584.03 rows=1 width=1562) (actual time=24.588..24.603 rows=77 loops=1) Sort Key:... Sort Method: quicksort Memory: 63kB -> Bitmap Heap Scan on _object (cost=75580.00..75584.02 rows=1 width=1562) (actual time=24.120..24.284 rows=77 loops=1) Recheck Cond: ((... @> ...::jsonb) AND (... @> ...::jsonb)) Heap Blocks: exact=73 -> Bitmap Index Scan on idx_object (cost=0.00..75580.00 rows=1 width=0) (actual time=24.094..24.094 rows=77 loops=1) Index Cond: ((... @> ...::jsonb) AND (... @> ...::jsonb)) Planning time: 0.301 ms Execution time: 24.723 ms (14 rows) It would seem that this miscalculation of the cost of the index scan is due to the query planner lacking detailed statistics about the relevant JSONB column. Here are the statistics collected by postgresql on the relevant columns. db=> \x db=> select attname, null_frac, n_distinct, array_length(most_common_vals,1) mcv, array_length(most_common_freqs,1) mcf, array_length(most_common_elems,1) mce, array_length(most_common_elem_freqs,1) mcef, elem_count_histogram from pg_stats where schemaname = 'zero2' and tablename = '_object'; -[ RECORD 1 ]--------+---------- attname | doctype null_frac | 0 n_distinct | 56 mcv | 4 mcf | 4 mce | mcef | elem_count_histogram | -[ RECORD 2 ]--------+---------- attname | app_id null_frac | 0 n_distinct | 2374 mcv | 100 mcf | 100 mce | mcef | elem_count_histogram | -[ RECORD 3 ]--------+---------- attname | status null_frac | 0 n_distinct | 2 mcv | 2 mcf | 2 mce | mcef | elem_count_histogram | -[ RECORD 4 ]--------+---------- attname | metadata null_frac | 0.0924333 n_distinct | 985 mcv | 100 mcf | 100 mce | mcef | elem_count_histogram | As you can see, most_common_elems and most_common_elems_freqs are empty. This prevents the query planner from making any kind of meaningful estimate of the number of index rows that it would need to access as a function of the query terms. The workaround I found so far is to set a low value of random_page_cost, but this could result in the query planner using index scans for other tables and other queries, where a seq scan would actually be more appropriate. The only real solution would seem to be for the postgresql engine to support the most_common_elems and most_common_elems_freqs for the JSONB data type. Are there any plans to implement this? Thanks! -- Alex