>> https://drive.google.com/file/d/0ByfjZX4TabhocUg2MFJ6a21qS2M/view?usp=sharing > Note: due an error in dump script, if you are in Linux/Unix environment, use this command for uncompressing the file: > > bzip2 -d -c comment_test.dump.bz2 |sed -e '12d' > comment_test.dump Hi, I've played a bit with this. Here's what I see. Let me give a bit of info: enrico=# \d stage.fbcomment Table "stage.fbcomment" Column | Type | Modifiers ----------+-------+----------- field_id | jsonb | Indexes: "comment_test_idx" btree ((field_id ->> 'pageId'::text)) enrico=# select pg_total_relation_size('stage.fbcomment'); pg_total_relation_size ------------------------ 83755008 (1 row) enrico=# select count(*) from stage.fbcomment; count ------- 23431 (1 row) enrico=# select sum(jsonb_array_length(field_id ->'comment')) from stage.fbcomment; sum -------- 541454 (1 row) -> to keep in mind: there are 23k rows, but if you unnest the 'comment' array there are 541k rows. The following two queries are just fine. I see the postgres worker reaching a "RES" size of 108MB for both. nrico=# explain analyze enrico-# SELECT substring((field_id ->'comment')::text,1,1) enrico-# FROM stage.fbcomment; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Seq Scan on fbcomment (cost=0.00..3012.62 rows=23431 width=828) (actual time=0.147..2749.940 rows=23431 loops=1) Planning time: 0.046 ms Execution time: 2756.881 ms (3 rows) Time: 2757.398 ms enrico=# enrico=# explain analyze enrico-# SELECT jsonb_array_elements(field_id ->'comment')->>'id' enrico-# FROM stage.fbcomment; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Seq Scan on fbcomment (cost=0.00..14552.39 rows=2343100 width=828) (actual time=0.067..885.041 rows=541454 loops=1) Planning time: 0.053 ms Execution time: 978.161 ms (3 rows) Time: 978.705 ms Interestingly, if you combine these, it quickly blows up! The following query with a limit 1000 already has a RES of well over 1GB. With larger limits it quickly thrashes my machine. enrico=# explain analyze SELECT substring((field_id ->'comment')::text,1,1), jsonb_array_elements(field_id ->'comment')->>'id' FROM stage.fbcomment limit 1000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..6.31 rows=1000 width=828) (actual time=0.200..2419.749 rows=1000 loops=1) -> Seq Scan on fbcomment (cost=0.00..14786.70 rows=2343100 width=828) (actual time=0.198..2418.931 rows=1000 loops=1) Planning time: 0.059 ms Execution time: 2659.065 ms (4 rows) Time: 2659.708 ms I think this triggers some code path that is not really optimal for memory usage for some reason. I don't know if there is something interesting to fix here or not. I guess other people will quickly see what happens here? In any case the solution for you might be to unnest the comments in this table and split the '{' vs '[' before doing your processing. I.e. create the intermediate table with the 541454 comments and then throw your queries against that table. This should also use way less processing time than the hack with the '[' vs '{' cases. Bye, Chris. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general