> That very much depends on where exactly the failure happens - it might
> be both (or relying on an assumption that changed over). It's not clear
> if it fails during query execution or formatting the output (I seem to
> remember there were/are issues with rows containing multiple large bytea
> values, for example).
>
> Can you attach gdb to the backend, set breakpoint on that elog and share
> the backtrace? Then we can investigate further.
I can't easily get a meaningful query plan (i.e. from the live system)
as the query that failed is based on a temporary table built by some
pl/pgsql (the database is more of a conveyor belt than a long term store).
Unfortunately it's difficult to reproduce the ERROR. This has only
occurred a couple of times on a live system in the last year and these
queries run frequently.
I think we will move to 9.5.10 and see if it ever happens again; then if
it does get a backtrace on that newer version. Incidentally we never saw
this error on 8.4 where this code was running from 2009 until 2017.
> I'll bet you it's not that. It's quite unlikely that would fail with
> exactly 1GB request size. It seems much more like a buffer that we keep
> to be power of 2. The question is which one.
I had dismissed corruption before writing in. It's exactly 1GB every
time this has happened - and we can dump the full dataset periodically
without issue.
>> I have my money on a corrupted TOAST entry. Is this happening on
>> trustworthy hardware or beige box with no ECC or RAID?
It's good quality commercial hardware in our colo - no exactly sure what.
Thanks for taking the time to look at this!
Stefan
On 01/30/18 22:00, Tomas Vondra wrote:
On 01/30/2018 10:43 PM, Jan Wieck wrote:
On Tue, Jan 30, 2018 at 12:35 PM, Stefan Blanke
<stefan.blanke@xxxxxxxxxxxxxx <mailto:stefan.blanke@xxxxxxxxxxxxxx>> wrote:
Hello,
We've tripped over an error when doing a "COPY.. TO STDOUT WITH
BINARY" query.
"ERROR: invalid memory alloc request size 1073741824"
(exactly 1GB)
I have my money on a corrupted TOAST entry. Is this happening on
trustworthy hardware or beige box with no ECC or RAID?
I'll bet you it's not that. It's quite unlikely that would fail with
exactly 1GB request size. It seems much more like a buffer that we keep
to be power of 2. The question is which one.
regards