Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Community,

I am trying to understand why all the secondary databases failed to start
after seeing a WAL related error for some time.

Timeline:

2024-04-19: WAL errors appear in the secondary database nodes

```
LOG: invalid resource manager ID 55 at 40/F46CBCA8
```

- the secondaries did not lag in replication
  - monitored via query
```
pg_last_xact_replay_timestamp
```

- 2024-05-02; Secondaries reboot and fail to start up

```
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000004100000049 has already been removed
 FATAL:  the database system is starting up
```

from my understanding, the WAL file is streamed over the network (secondary pulls from primary) and creates a WAL file in the secondary.
then it replays the copied WAL file using a different process.

in order for the local WAL file to go out of sync,

1. the primary removed the WAL file, the secondary was streaming
2. the WAL file on the secondary got corrupted
3 ....

Questions

- what do those error messages mean ?
- how can I prevent this from happening ?

- references
  - https://www.postgresql.org/docs/9.5/wal-configuration.html

Any advice/information is highly appreciated.
thank you
mohan

[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux