I have a table of semi-structured json that I am storing in a JSONB column with
a GIN index:
create table foo (
id text primary key,
obj jsonb
);
create index foo_obj on foo using gin (obj);
I populated the table with 10,000 rows of randomly generated JSON objects, with
random amounts of nesting. Here are the first few rows:
{"b": 0, "c": 3, "d": 2, "id": "0"}
{"b": 4, "c": {"a": 3, "b": 4, "c": 4, "d": 1}, "id": "1"}
{"a": 3, "b": 1, "c": 0, "d": 0, "id": "2"}
{"a": 3, "b": 2, "c": 4, "d": 1, "id": "3"}
{"a": 2, "b": {"b": 3, "c": 2, "d": 0}, "c": 3, "id": "4"}
{"c": 2, "id": "5"}
{"b": {"a": {"d": 3}, "b": 2, "c": 3, "d": 3}, "id": "6"}
{"a": 0, "b": 1, "id": "7"}
{"a": 1, "c": 3, "d": 3, "id": "8"}
{"a": 4, "b": 3, "c": 1, "d": 1, "id": "9"}
{"a": 0, "b": {"a": 3, "b": 0, "d": 3}, "d": 0, "id": "10"}
{"a": {"b": 2}, "b": 4, "c": 4, "d": 4, "id": "11"}
{"a": {"a": 2, "b": 3, "c": 1, "d": {"c": 4}}, "c": 1, "d": 0, "id": "12"}
...
There are two queries that I want to issue using the GIN index. The first
query, which seems to work, returns the 2 records in the test data set that
happen to have a nested key with a certain value. If I use EXPLAIN on the query,
it shows me that the index is being used.
postgres=# explain analyze select obj from foo where (obj @@ '$.a.a.a.a == 1'::jsonpath);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=44.01..48.02 rows=1 width=102) (actual time=3.531..5.402 rows=2 loops=1)
Recheck Cond: (obj @@ '($."a"."a"."a"."a" == 1)'::jsonpath)
Rows Removed by Index Recheck: 4211
Heap Blocks: exact=173
-> Bitmap Index Scan on foo_obj (cost=0.00..44.01 rows=1 width=0) (actual time=2.330..2.331 rows=4213 loops=1)
Index Cond: (obj @@ '($."a"."a"."a"."a" == 1)'::jsonpath)
Planning Time: 0.276 ms
Execution Time: 5.464 ms
(8 rows)
The second query is the one I'm having trouble with. For this one, I just want
to know if a nested key is present in the row -- I don't care what its value is.
This query works in the sense that it does return the 13 rows out of 10,000 that
happen to have the given nested key. However, EXPLAIN shows that the index
isn't being used.
postgres=# explain analyze select obj from foo where (obj @? '$.a.a.a.a'::jsonpath);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..298.00 rows=1 width=102) (actual time=0.549..4.360 rows=13 loops=1)
Filter: (obj @? '$."a"."a"."a"."a"'::jsonpath)
Rows Removed by Filter: 9987
Planning Time: 0.164 ms
Execution Time: 4.384 ms
How can I fix the second query so it uses the GIN index? Maybe the first query
is inefficient too and I just don't understand what EXPLAIN is telling me? Is
there anything else I'm doing wrong or have muddled up? I'm new to JSONB.
P.S. For what its worth, I've come up with a variant of the first query that also
works, and that also uses the index:
select obj from foo where (obj @> '{"a": {"a": {"a": {"a": 1}}}}'::jsonb);
a GIN index:
create table foo (
id text primary key,
obj jsonb
);
create index foo_obj on foo using gin (obj);
I populated the table with 10,000 rows of randomly generated JSON objects, with
random amounts of nesting. Here are the first few rows:
{"b": 0, "c": 3, "d": 2, "id": "0"}
{"b": 4, "c": {"a": 3, "b": 4, "c": 4, "d": 1}, "id": "1"}
{"a": 3, "b": 1, "c": 0, "d": 0, "id": "2"}
{"a": 3, "b": 2, "c": 4, "d": 1, "id": "3"}
{"a": 2, "b": {"b": 3, "c": 2, "d": 0}, "c": 3, "id": "4"}
{"c": 2, "id": "5"}
{"b": {"a": {"d": 3}, "b": 2, "c": 3, "d": 3}, "id": "6"}
{"a": 0, "b": 1, "id": "7"}
{"a": 1, "c": 3, "d": 3, "id": "8"}
{"a": 4, "b": 3, "c": 1, "d": 1, "id": "9"}
{"a": 0, "b": {"a": 3, "b": 0, "d": 3}, "d": 0, "id": "10"}
{"a": {"b": 2}, "b": 4, "c": 4, "d": 4, "id": "11"}
{"a": {"a": 2, "b": 3, "c": 1, "d": {"c": 4}}, "c": 1, "d": 0, "id": "12"}
...
There are two queries that I want to issue using the GIN index. The first
query, which seems to work, returns the 2 records in the test data set that
happen to have a nested key with a certain value. If I use EXPLAIN on the query,
it shows me that the index is being used.
postgres=# explain analyze select obj from foo where (obj @@ '$.a.a.a.a == 1'::jsonpath);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=44.01..48.02 rows=1 width=102) (actual time=3.531..5.402 rows=2 loops=1)
Recheck Cond: (obj @@ '($."a"."a"."a"."a" == 1)'::jsonpath)
Rows Removed by Index Recheck: 4211
Heap Blocks: exact=173
-> Bitmap Index Scan on foo_obj (cost=0.00..44.01 rows=1 width=0) (actual time=2.330..2.331 rows=4213 loops=1)
Index Cond: (obj @@ '($."a"."a"."a"."a" == 1)'::jsonpath)
Planning Time: 0.276 ms
Execution Time: 5.464 ms
(8 rows)
The second query is the one I'm having trouble with. For this one, I just want
to know if a nested key is present in the row -- I don't care what its value is.
This query works in the sense that it does return the 13 rows out of 10,000 that
happen to have the given nested key. However, EXPLAIN shows that the index
isn't being used.
postgres=# explain analyze select obj from foo where (obj @? '$.a.a.a.a'::jsonpath);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..298.00 rows=1 width=102) (actual time=0.549..4.360 rows=13 loops=1)
Filter: (obj @? '$."a"."a"."a"."a"'::jsonpath)
Rows Removed by Filter: 9987
Planning Time: 0.164 ms
Execution Time: 4.384 ms
How can I fix the second query so it uses the GIN index? Maybe the first query
is inefficient too and I just don't understand what EXPLAIN is telling me? Is
there anything else I'm doing wrong or have muddled up? I'm new to JSONB.
P.S. For what its worth, I've come up with a variant of the first query that also
works, and that also uses the index:
select obj from foo where (obj @> '{"a": {"a": {"a": {"a": 1}}}}'::jsonb);