I have analyzed the PostgreSQL protocol using Wireshark (an open source packet analyzer), and I observed that theÂPostgreSQL backend, while doing a COPY ... FROM STDIN, reports errors as soon as possible (especially errors related to invalid data).
Therefore, the "late" reporting of errors while doing a COPY ... FROM STDIN is not a limitation of the underlying protocol; it is a limitation (or a design choice) of the libpq library.
It looks like this is a well known issue because it is listed on the todo list:
And was discussed before:
Do you think it is possible to change that behavior, or work around it?
While reading libpq source code, I noticed theÂfunction pqParseInput3 (file fe-protocol3.c) ignores error responses while the connection is inÂPGASYNC_COPY_IN state. Maybe we can make a special case for the COPY FROM subprotocol and handle errors early, in order to make them available to PQgetResult? Is is feasible in a simple way or is it a bad idea?
Regards,
Nicolas Grilly
On Wed, Feb 2, 2011 at 20:06, John R Pierce <pierce@xxxxxxxxxxxx> wrote:
On 02/02/11 10:20 AM, Nicolas Grilly wrote:I would batch the data, maybe 1000 lines or even 100 lines at a time if these errors are at all frequent. Âput the errored batches in an exception list or something so you can sort them out later.
Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send gigabytes of data with just one "copy ... from stdio" query, and is there a way to be notified of a potential error before calling PQputCopyEnd? Or do I have to send my data in small chunks (for example batch of 10000 rows), issue a PQputCopyEnd, check for errors, and continue with the next chunk?