Hi,
We have a table which has a jsonb column in it. Each row contains a lot
of data in that column, so TOASTed.
We have to extract data from that column at different levels, so an
example query could look like
select
col1,
col2,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val1."text()"')
as val1,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val2."text()"')
as val2,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val3."text()"')
as val3
from tbl
where
id = 1;
I tried to rewrite it to
WITH foo AS (select
id,
col1,
col2,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5') as jsondata,
from tbl )
select
col1,
col2,
jsondata->val1->'text()' as val1,
jsondata->val2->'text()' as val2,
jsondata->val3->'text()' as val3
from foo
where
id = 1;
However, WITH has the same run-time profile - most of the time is spent
in pglz_decompress. Using the -> notation has the same profile.
The more data I extract from the JSON object the slower the query gets.
Of course, if I change the column to EXTERNAL we see a ~3.5 x speedup in
the queries but disk space requirements goes up by too much.
(We need to use a jsonb column as the data is unstructured, and may
differ in structure between rows. Yes, yes, I know...)
PostgreSQL 12.x on RHEL.
If anybody has some good ideas it would be appreciated.
Thanks in advance !
Best regards,
Jesper