Hi Maxim,
Thanks for the suggestions. Unfortunately, it does appear to be an IO issue but not one I can get around (on my laptop at least!). Postgres seems to refuse to put any of the index or table in shared buffers at all. For some reason, there seems to be a very great increase in space required for having each event (item) on its own row as opposed to having it with the original items array within far fewer database rows:
relation | size
-------------------------------------+------------
public.myevents | 3502 MB
...
pg_toast.pg_toast_57487 | 800 MB
...
public.events | 2232 kB
...
myevents is where each event has a row to itself and events has 100 events in an array within the JSON rows (and the toast table above is where the data seem to be stored) - actually the events table has more data in it as there are the extra paging objects which have been removed from myevents. Performing vacuum analyze seems to have no effect either.
Getting back to my original point - you pointed out that for queries that need a decent % of the table it will be cheaper to do a scan, which is exactly what the query planner does for the relational version. If it only needs a small % of the values it looks at the index and for a large % it goes for a scan (it also puts everything in shared buffers and is lightening quick!). Is this just a lack of maturity in the jsonb planner or am I missing something?
Thanks again,
Anton
On 26 December 2014 at 14:19, Maxim Boguk <maxim.boguk@xxxxxxxxx> wrote:
-> Bitmap Heap Scan on myevents (cost=35.80..3615.09 rows=3716 width=0) (actual time=351.510..77669.907 rows=1417152 loops=1)Recheck Cond: (event @> '{"event": "delivered"}'::jsonb)Heap Blocks: exact=298362Buffers: shared hit=1 read=298589...Execution time: 80986.340 ms-> Bitmap Heap Scan onmyevents (cost=42.80..3622.09 rows=3716 width=0) (actual time=534.816..78526.944 rows=1417152 loops=1)Recheck Cond: (event @> '{"event": "delivered"}'::jsonb)Heap Blocks: exact=298362Buffers: shared hit=383 read=299133...Execution time: 81898.578 msHi Anton,What you see there (i think) - it's a performance hit of random disk read for non-cached database.Try increase a shared buffers to value when table and index could fit into, and redo queries few time until you see something like
Buffers: shared hit=bigvalue read=0 and compare performance, it might change timing quite a lot.Also, I recommend set track_io_timing=on in postgresql.conf and after it use explain (analyze, buffers, timing) to see check how much time database spent doing IO operations.Also try perform vacuum analyze myevents; before testing because it seems that you have no up to date visibility map on the table.
However, even in fully cached case selecting 40% on the table rows almost always will be faster via sequential scan, so I don't expect miracles.--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@xxxxxxxxx
МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
echo '16i[q]sa[ln0=aln100%Pln100/snlbx]sbA0D4D465452snlbxq' | dc
This will help you for 99.9% of your problems ...
This will help you for 99.9% of your problems ...