Folks, Currently, JSONB fields don't have statistics, and estimate a flat 1% selectivity. This can result in poor query plans, and I'm wondering if anyone has a suggested workaround for this short of hacking a new selectivity function. For example, take the common case of using JSONB to hold a list of "tags" for tagging documents: Table "public.doc_tags_json" Column | Type | Modifiers --------+---------+----------- doc_id | integer | tags | jsonb | Indexes: "doc_tags_json_doc_id_idx" UNIQUE, btree (doc_id) "doc_tags_json_tags_idx" gin (tags) This query: select doc_id from doc_tags_json where tags @> '[ "math", "physics" ]' order by doc_id desc limit 25; Uses this plan: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..709.79 rows=25 width=4) (actual time=24.529..340.499 rows=25 loops=1) -> Index Scan Backward using doc_tags_json_doc_id_idx on doc_tags_json (cost=0.43..283740.95 rows=10000 width=4) (actual time=24.528..340.483 rows=25 loops=1) Filter: (tags @> '["math", "physics"]'::jsonb) Rows Removed by Filter: 1011878 Planning time: 0.090 ms Execution time: 340.528 ms It does this because it expects @> '["math", "physics"]' to match 10,000 rows, which means that it expects to only scan 25,000 entries in the doc_id index to return the top 25. However, the matching condition is much rarer than it thinks, so it's actually far faster to use the index on the JSONB column: drop index doc_tags_json_doc_id_idx; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=10517.08..10517.14 rows=25 width=4) (actual time=7.594..7.602 rows=25 loops=1) -> Sort (cost=10517.08..10542.08 rows=10000 width=4) (actual time=7.593..7.596 rows=25 loops=1) Sort Key: doc_id Sort Method: top-N heapsort Memory: 26kB -> Bitmap Heap Scan on doc_tags_json (cost=92.90..10234.89 rows=10000 width=4) (actual time=6.733..7.475 rows=257 loops=1) Recheck Cond: (tags @> '["math", "physics"]'::jsonb) Heap Blocks: exact=256 -> Bitmap Index Scan on doc_tags_json_tags_idx (cost=0.00..90.40 rows=10000 width=0) (actual time=6.695..6.695 rows=257 loops=1) Index Cond: (tags @> '["math", "physics"]'::jsonb) Planning time: 0.093 ms Execution time: 7.632 ms On a normal column, I'd raise n_distinct to reflect the higher selecivity of the search terms. However, since @> uses contsel, n_distinct is ignored. Anyone know a clever workaround I don't currently see? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance