Re: Out of Memory

Chris Mair <chris@xxxxxxxx> · Fri, 16 Jan 2015 15:37:58 +0100

>> https://drive.google.com/file/d/0ByfjZX4TabhocUg2MFJ6a21qS2M/view?usp=sharing

> Note: due an error in dump script, if you are in Linux/Unix environment, use this command for uncompressing the file:
> 
> bzip2 -d -c comment_test.dump.bz2 |sed -e '12d' > comment_test.dump

Hi,

I've played a bit with this. Here's what I see.

Let me give a bit of info:

enrico=# \d stage.fbcomment
   Table "stage.fbcomment"
  Column  | Type  | Modifiers
----------+-------+-----------
 field_id | jsonb |
Indexes:
    "comment_test_idx" btree ((field_id ->> 'pageId'::text))

enrico=# select pg_total_relation_size('stage.fbcomment');
 pg_total_relation_size
------------------------
               83755008
(1 row)

enrico=# select count(*) from stage.fbcomment;
 count
-------
 23431
(1 row)

enrico=# select sum(jsonb_array_length(field_id ->'comment')) from stage.fbcomment;
  sum
--------
 541454
(1 row)

-> to keep in mind: there are 23k rows, but if you unnest the 'comment' array there are 541k rows.

The following two queries are just fine. I see the postgres worker reaching a "RES" size of 108MB
for both.

nrico=# explain analyze
enrico-# SELECT substring((field_id ->'comment')::text,1,1)
enrico-# FROM stage.fbcomment;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on fbcomment  (cost=0.00..3012.62 rows=23431 width=828) (actual time=0.147..2749.940 rows=23431 loops=1)
 Planning time: 0.046 ms
 Execution time: 2756.881 ms
(3 rows)

Time: 2757.398 ms
enrico=#
enrico=# explain analyze
enrico-# SELECT jsonb_array_elements(field_id ->'comment')->>'id'
enrico-# FROM stage.fbcomment;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on fbcomment  (cost=0.00..14552.39 rows=2343100 width=828) (actual time=0.067..885.041 rows=541454 loops=1)
 Planning time: 0.053 ms
 Execution time: 978.161 ms
(3 rows)

Time: 978.705 ms

Interestingly, if you combine these, it quickly blows up! The following query with a limit 1000 already
has a RES of well over 1GB. With larger limits it quickly thrashes my machine.

enrico=# explain analyze
SELECT substring((field_id ->'comment')::text,1,1),
       jsonb_array_elements(field_id ->'comment')->>'id'
FROM stage.fbcomment limit 1000;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..6.31 rows=1000 width=828) (actual time=0.200..2419.749 rows=1000 loops=1)
   ->  Seq Scan on fbcomment  (cost=0.00..14786.70 rows=2343100 width=828) (actual time=0.198..2418.931 rows=1000 loops=1)
 Planning time: 0.059 ms
 Execution time: 2659.065 ms
(4 rows)

Time: 2659.708 ms

I think this triggers some code path that is not really optimal for memory usage for some reason. I don't
know if there is something interesting to fix here or not. I guess other people will quickly see what happens
here?

In any case the solution for you might be to unnest the comments in this table and split the '{' vs '[' before doing
your processing. I.e. create the intermediate table with the 541454 comments and then throw your queries against that
table. This should also use way less processing time than the hack with the '[' vs '{' cases.

Bye,
Chris.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general