Re: JSONB index not in use, but is TOAST the real cause of slow query?

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 28 May 2022 14:41:56 -0400

Shaheed Haque <shaheedhaque@xxxxxxxxx> writes:
> ======================
> foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM
> paiyroll_payrun WHERE snapshot ->'employee' ? '2209';
>                                                   QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
> Seq Scan on paiyroll_payrun  (cost=0.00..29.13 rows=9 width=32)
> (actual time=50.185..2520.983 rows=104 loops=1)
>   Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
>   Rows Removed by Filter: 835
> Planning Time: 0.075 ms
> Execution Time: 2521.004 ms
> (5 rows)
> ======================

> So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to
> presume the INDEX is not used because of the number of rows? Is there
> a way to verify that?

You could do "set enable_seqscan = off" and see if the EXPLAIN
results change.  My guess is that you'll find that the indexscan
alternative is costed at a bit more than 29.13 units and thus
the planner thinks seqscan is cheaper.

> And how can I understand the dreadful amount of
> time (of course, this is just on my dev machine, but still...)?

In the seqscan case, the -> operator is going to retrieve the whole
JSONB value from each row, which of course is pretty darn expensive
if it's a few megabytes.  Unfortunately the planner doesn't account
for detoasting costs when making such estimates, so it doesn't
realize that the seqscan case is going to be expensive.  (Fixing
that has been on the to-do list for a long time, but we seldom
see cases where it matters this much, so it hasn't gotten done.)

The problem would likely go away by itself if your table had more
than a few hundred rows, but if you don't anticipate that happening
then you need some sort of band-aid.  I don't recommend turning
enable_seqscan off as a production fix; it'd likely have negative
effects on other queries.  Personally I'd experiment with reducing
random_page_cost a bit to see if I could encourage use of the index
that way.  The default value of 4.0 is tuned for spinning-rust
storage and is not too appropriate for a lot of modern hardware,
so there's probably room to fix it that way without detuning your
setup for other queries.

You should probably also rethink whether you really want to store
your data in this format, because anything at all that you do with
that big JSONB column is going to be expensive.  (Another thing
that's been on the to-do list for awhile is enabling partial
retrieval of large JSONB values, but AFAIK that hasn't happened
yet either.)

			regards, tom lane