> -----Original Message----- > From: Justin Pryzby <pryzby@xxxxxxxxxxxxx> > Sent: Tuesday, March 1, 2022 14:27 > To: ldh@xxxxxxxxxxxxxxxxxx > Cc: pgsql-performance@xxxxxxxxxxxxxx > Subject: Re: An I/O error occurred while sending to the backend (PG 13.4) > > On Tue, Mar 01, 2022 at 04:28:31PM +0000, ldh@xxxxxxxxxxxxxxxxxx > wrote: > > Now, there is an additional component I think... Storage is on an array > and I am not getting a clear answer as to where it is 😊 Is it possible that > something is happening at the storage layer? Could that be reported as a > network issue vs a storage issue for Postgres? > > No. If there were an error with storage, it'd be reported as a local error, > and the query would fail, rather than failing with client-server > communication. > > > Also, both machines are actually VMs. I forgot to mention that and not > sure if that's relevant. > > Are they running on the same hypervisor ? Is that hyperv ? > Lacking other good hypotheses, that does seem relevant. > > -- > Justin Issue happened again last night. I did implement your recommendations but it didn't seem to prevent the issue: tcp_keepalives_idle=9 # TCP_KEEPIDLE, in seconds; # 0 selects the system default tcp_keepalives_interval=9 # TCP_KEEPINTVL, in seconds; # 0 selects the system default tcp_keepalives_count=0 # TCP_KEEPCNT; # 0 selects the system default #tcp_user_timeout = 0 # TCP_USER_TIMEOUT, in milliseconds; # 0 selects the system default On the client application, the exceptions are: 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Unexpected error 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : org.pentaho.di.core.exception.KettleStepException: 2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop because of: 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the backend. 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313) 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 2022/03/03 01:04:56 - Upsert2.0 - at java.lang.Thread.run(Thread.java:745) 2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the backend. 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1321) 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1245) 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1233) 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163) 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299) 2022/03/03 01:04:56 - Upsert2.0 - ... 2 more 2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend. 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:134) 2022/03/03 01:04:56 - Upsert2.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1288) 2022/03/03 01:04:56 - Upsert2.0 - ... 6 more 2022/03/03 01:04:56 - Upsert2.0 - Caused by: java.net.SocketException: Connection reset 2022/03/03 01:04:56 - Upsert2.0 - at java.net.SocketInputStream.read(SocketInputStream.java:209) 2022/03/03 01:04:56 - Upsert2.0 - at java.net.SocketInputStream.read(SocketInputStream.java:141) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.PGStream.receiveChar(PGStream.java:453) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2119) 2022/03/03 01:04:56 - Upsert2.0 - at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355) 2022/03/03 01:04:56 - Upsert2.0 - ... 11 more On the DB: 2022-03-03 01:04:40 EST [21228] LOG: could not receive data from client: An existing connection was forcibly closed by the remote host. 2022-03-03 01:04:40 EST [21228] LOG: unexpected EOF on client connection with an open transaction 2022-03-03 01:04:40 EST [21228] LOG: disconnection: session time: 0:02:07.570 user=postgres database=Pepper host=10.64.17.211 port=63686 2022-03-03 01:04:41 EST [21160] LOG: could not receive data from client: An existing connection was forcibly closed by the remote host. 2022-03-03 01:04:41 EST [21160] LOG: unexpected EOF on client connection with an open transaction 2022-03-03 01:04:41 EST [21160] LOG: disconnection: session time: 0:02:07.730 user=postgres database=Pepper host=10.64.17.211 port=63688 I don't know if that is meaningful, but I see a 15s delay between the timestamp on the database and on the application. The servers are synchronized properly. I have asked the IT team to look at the VMs and see if anything strange is happening. They are not too happy with installing WireShark to do more analysis given the "complexity of the tools and size of the logs" 😊 I keep on pushing. Thank you, Laurent.