Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC in pg_wal "No space left on device"

Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx> · Wed, 21 Nov 2018 08:03:57 +0200



    On 20/11/18 10:48 μ.μ., Rui DeSousa
      wrote:

    
          On Nov 20, 2018, at 3:34 PM, Achilleas Mantzios
            <achill@xxxxxxxxxxxxxxxxxxxxx>
            wrote:
          

          Hey, I was reading the docs, it
              seems it means :

            
            net.ipv4.tcp_keepalive_time +
              net.ipv4.tcp_keepalive_intvl *
              net.ipv4.tcp_keepalive_probes = 2hrs 11 Mins 15 Secs,
              rather than 18 Hrs
        
      
      Yeah, that’s correct.  I wonder why it didn’t
        terminate.
    
    
    Most probably because there was another created clone, cloud
    migration magic, that's my theory, albeit not confirmed by the
    provider. Logical worker (walreceiver) was still alive and happy
    even after the primary crushed. I have the logs from the other
    standby and it immediately detected the problem (PANIC on the
    primary) and retried. No firewall dropping packets, in every test I
    did, the logical bgworker detects any problems *instantly*, and
    retries after 5 secs by default.