Re: 13.4 on RDS, SSL SYSCALL EOF on restore

Bruce Momjian <bruce@xxxxxxxxxx> · Fri, 8 Oct 2021 17:35:35 -0400

On Fri, Oct  8, 2021 at 02:27:55PM -0700, Wells Oliver wrote:
> Hi: I am restoring a ~100GB backup using 16 jobs from an EC2 instance to an RDS
> instance (db.m6g.xlarge, which is 16GB RAM and 4 CPU) and it's dying midway
> with the dreaded "SSL SYSCALL error: EOF detected" error.
> 
> I did create a parameter group to hopefully speed the restoration process, it
> includes:
> 
> - wal_buffers 8192 (64MB)
> - checkpoint_timeout 3600 (1h)
> - min_wal_size 192 (192MB)
> - max_wal_size 102400 (100GB)
> - shared_buffers 524288 (4GB)
> - synchronous_commit 0 (off)
> - autovacuum 0 (off)
> - maintenance_work_mem 2097152 (2GB)
> - work_mem 32768 (32MB)
> 
> I sourced these from a few different folks as well as some trial and error, but
> now it's blowing up on me. 
> 
> If I revert the RDS instance back to default PG parameters, it restores, but it
> takes 3x the time.

Wow, that is weird.  I see that error string happening when PG can't
extend the receipt buffer on the client side:

     appendPQExpBufferStr(&conn->errorMessage,
                   libpq_gettext("SSL SYSCALL error: EOF detected\n"));

Can you check the server logs to see if there is any error there?  If I
had to take a guess, I would reduce maintenance_work_mem to 1GB and
retest.

-- 
  Bruce Momjian  <bruce@xxxxxxxxxx>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.