Re: array size exceeds the maximum allowed (1073741823) when building a json

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 9 Jun 2016 15:36:21 -0500

On Thu, Jun 9, 2016 at 8:43 AM, Nicolas Paris <niparisco@xxxxxxxxx> wrote:
>
>
> 2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure@xxxxxxxxx>:
>>
>> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> > Michael Paquier <michael.paquier@xxxxxxxxx> writes:
>> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@xxxxxxxxxxxx> wrote:
>> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>> >>>> Will this 1GO restriction is supposed to increase in a near future ?
>> >
>> >>> Not planned, no.  Thing is, that's the limit for a field in general,
>> >>> not
>> >>> just JSON; changing it would be a fairly large patch.  It's
>> >>> desireable,
>> >>> but AFAIK nobody is working on it.
>> >
>> >> And there are other things to consider on top of that, like the
>> >> maximum allocation size for palloc, the maximum query string size,
>> >> COPY, etc. This is no small project, and the potential side-effects
>> >> should not be underestimated.
>> >
>> > It's also fair to doubt that client-side code would "just work" with
>> > no functionality or performance problems for such large values.
>> >
>> > I await with interest the OP's results on other JSON processors that
>> > have no issues with GB-sized JSON strings.
>>
>> Yup.  Most json libraries and tools are going to be disgusting memory
>> hogs or have exponential behaviors especially when you consider you
>> are doing the transformation as well.  Just prettifying json documents
>> over 1GB can be a real challenge.
>>
>> Fortunately the workaround here is pretty easy.  Keep your query
>> exactly as is but remove the final aggregation step so that it returns
>> a set. Next, make a small application that runs this query and does
>> the array bits around each row (basically prepending the final result
>> with [ appending the final result with ] and putting , between rows).
>
>
> The point is when prepending/appending leads to deal with strings.
> Transforming each value of the resultset to a string implies to escape the
> double quote.
> then:
> row1 contains {"hello":"world"}
> step 1 = prepend -> "[{\"hello\":\"world\"}"
> step 2 = append -> "[{\"hello\":\"world\"},"

right 3 rows contain {"hello":"world"}

before iteration: emit '['
before every row except the first, prepend ','
after iteration: emit ']'

you end up with:
[{"hello":"world"}
,{"hello":"world"}
,{"hello":"world"}]

...which is 100% valid json as long as each row of the set is a json object.

in SQL, the technique is like this:
select ('[' || string_agg(j::text, ',') || ']')::json from (select
json_build_object('hello', 'world') j from generate_series(1,3)) q;

the difference is, instead of having the database do the string_agg
step, it's handled on the client during iteration over the output of
generate_series.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance