Re: performance issue with bitmap index scans on huge amounts of big jsonb documents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
<marc-olaf.jaschke@xxxxxxx> wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big jsonb documents.
>
>
> ===== Background =====
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> ===== Example =====
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 100000) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> --  query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 rows=1 loops=1)
>   ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 width=0) (actual time=16.496..617.431 rows=50000 loops=1)
>         Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
>         Heap Blocks: exact=637
>         ->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  (cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=50000 loops=1)
>               Index Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead  Shared Object     Symbol
>   51,06%  postgres          [.] pglz_decompress
>    7,33%  libc-2.12.so      [.] memcpy

Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data.  See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

Naturally this is a huge tradeoff so do some careful analysis before
making the change.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux