Search Postgresql Archives

Re: Basebackup fails without useful error message

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Adrian, and everyone else.

It has finally happened, the backup ran into an error again, and the verbose output set me on the right path.

I'm getting this error message:

> pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.

Combined with the main server logging:

> terminating walsender process due to replication timeout

Now, the server is set up with an archive_command which gzips the WAL files and writes them to a network filesystem.

From looking at machine metrics at the time, my conclusion is the following:

At the time of the error, the remote filesystem experienced a very high queue size for new writes.

So I'm assuming the process of writing WAL files, if there is an archive_command set, is only considered to be finished after the archive is written, not just when the WAL file is written in pg_wal.

I'm also seeing in the documentation that the default WAL method for pg_basebackup is "stream", which waits for these WAL files as they are produced.

I suspect that I have 2 possible paths at this point:

1: increase wal_sender_timeout
2: run the basebackup with --wal-method=none since my restore_command is set up to explicitly go to the very same network storage to get the archived WAL files.

I'm going to be testing this. If someone could confirm that this is how writing WAL files works, that being: that it is only considered "done" when the archive_command is done, that would be great.

Regards,
Koen De Groote


On Sun, Sep 29, 2024 at 6:08 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 9/29/24 08:57, Koen De Groote wrote:
>  > What is the complete command you are using?
>
> The full command is:
>
> pg_basebackup -h localhost -p 5432 -U basebackup_user -D
> /mnt/base_backup/dir -Ft -z -P
>
> So output Format as tar, gzipped, and with progress being printed.
>
>  > Have you looked at the Postgres log?
>
>  > Is --verbose being used?
>
> This is straight from the logs, it's the only output besides the %
> progress counter.
>
> Will have a look at --verbose.

When you report on that and if it does not report the error then what is?:

Postgres version.

OS and version.

Anything special about the cluster like tablespaces, extensions,
replication, etc.


>
> Regards,
> Koen De Groote
>

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux