Re: array size exceeds the maximum allowed (1073741823) when building a json

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure@xxxxxxxxx>:
On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Michael Paquier <michael.paquier@xxxxxxxxx> writes:
>> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@xxxxxxxxxxxx> wrote:
>>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>>> Will this 1GO restriction is supposed to increase in a near future ?
>
>>> Not planned, no.  Thing is, that's the limit for a field in general, not
>>> just JSON; changing it would be a fairly large patch.  It's desireable,
>>> but AFAIK nobody is working on it.
>
>> And there are other things to consider on top of that, like the
>> maximum allocation size for palloc, the maximum query string size,
>> COPY, etc. This is no small project, and the potential side-effects
>> should not be underestimated.
>
> It's also fair to doubt that client-side code would "just work" with
> no functionality or performance problems for such large values.
>
> I await with interest the OP's results on other JSON processors that
> have no issues with GB-sized JSON strings.

Yup.  Most json libraries and tools are going to be disgusting memory
hogs or have exponential behaviors especially when you consider you
are doing the transformation as well.  Just prettifying json documents
over 1GB can be a real challenge.

Fortunately the workaround here is pretty easy.  Keep your query
exactly as is but remove the final aggregation step so that it returns
a set. Next, make a small application that runs this query and does
the array bits around each row (basically prepending the final result
with [ appending the final result with ] and putting , between rows).

​The point is when prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to escape the double quote.
then:
row1 contains {"hello":"world"}
step 1 = prepend -> "[{\"hello\":\"world\"}"
step 2 = append -> "[{\"hello\":\"world\"},"
and so on
the json is corrupted. Hopelly I am sure I am on a wrong way about that.

 
It's essential that you use a client library that does not buffer the
entire result in memory before emitting results.   This can be done in
psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
node.js pg module can do this as well, and there certainty will be
others.

The basic objective is you want the rows to be streamed out of the
database without being buffered.  If you do that, you should be able
to stream arbitrarily large datasets out of the database to a json
document assuming the server can produce the query.

merlin


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux