> -----Original Message----- > From: Justin Pryzby <pryzby@xxxxxxxxxxxxx> > Sent: Monday, February 28, 2022 17:05 > To: ldh@xxxxxxxxxxxxxxxxxx > Cc: pgsql-performance@xxxxxxxxxxxxxx > Subject: Re: An I/O error occurred while sending to the backend (PG 13.4) > > On Mon, Feb 28, 2022 at 09:43:09PM +0000, ldh@xxxxxxxxxxxxxxxxxx > wrote: > > On Wed, Feb 23, 2022 at 07:04:15PM -0600, Justin Pryzby wrote: > > > > And the aforementioned network trace. You could set a capture > filter on TCP > > > > SYN|RST so it's not absurdly large. From my notes, it might look like > this: > > > > (tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0) > > > > > > I'd also add '|| icmp'. My hunch is that you'll see some ICMP (not > "ping") > > > being sent by an intermediate gateway, resulting in the connection > being > > > reset. > > > > I am so sorry but I do not understand what you are asking me to do. I am > unfamiliar with these commands. Is this a postgres configuration file? Is this > something I just do once or something I leave on to hopefully catch it when > the issue occurs? Is this something to do on the DB machine or the ETL > machine? FYI: > > It's no problem. > > I suggest that you run wireshark with a capture filter to try to show *why* > the connections are failing. I think the capture filter might look like: > > (icmp || (tcp[tcpflags] & (tcp-rst|tcp-syn|tcp-fin)!=0)) && host > 10.64.17.211 > > With the "host" filtering for the IP address of the *remote* machine. > > You could run that on whichever machine is more convenient and leave it > running for however long it takes for that error to happen. You'll be able to > save a .pcap file for inspection. I suppose it'll show either a TCP RST or an > ICMP. > Whichever side sent that is where the problem is. I still suspect the issue > isn't in postgres. > > > - My ETL machine is on 10.64.17.211 > > - My DB machine is on 10.64.17.210 > > - Both on Windows Server 2012 R2, x64 > > These network details make my theory unlikely. > > They're on the same subnet with no intermediate gateways, and > communicate directly via a hub/switch/crossover cable. If that's true, then > both will have each other's hardware address in ARP after pinging from one > to the other. > > -- > Justin Yes, the machines ARE on the same subnet. They actually even are on the same physical rack as per what I have been told. When I run a tracert, I get this: Tracing route to PRODDB.xxx.int [10.64.17.210] over a maximum of 30 hops: 1 1 ms <1 ms <1 ms PRODDB.xxx.int [10.64.17.210] Trace complete. Now, there is an additional component I think... Storage is on an array and I am not getting a clear answer as to where it is 😊 Is it possible that something is happening at the storage layer? Could that be reported as a network issue vs a storage issue for Postgres? Also, both machines are actually VMs. I forgot to mention that and not sure if that's relevant. Thank you, Laurent.