Thanks Keith this is useful.
One more query if I need to know that if a have fallen too far behind and the WAL is not available. I guess I can do this. Let me know if my understanding is correct.
- Run pg_controldata <DATA_DIR> on the slave node which has been down for long.
- It will output the details about the WAL along with the WAL file name from where it will start the replication. (Field to look for in the output– 'Latest checkpoint's REDO WAL file')
- Then check if the file mentioned in ` 'Latest checkpoint's REDO WAL file' is present in master `pg_wal` directory. If not then slave have fallen too far behind and will not be able to recover from WAL.
On Mon, Jun 18, 2018 at 7:39 PM, Keith <keith@xxxxxxxxxxx> wrote:
The portion before the slash is the actual WAL file itself that it got to start replaying, usually the last few characters of the WAL file name. The portion after the slash is the position in the WAL file itself. Default size of each WAL is 16MB, so if that first portion doesn't match the primary, you can count back that many WAL files x16MB. If it's on the same WAL file, I'm not really sure how to get a more precise byte lag than that.KeithOn Mon, Jun 18, 2018 at 9:59 AM, Debraj Manna <subharaj.manna@xxxxxxxxx> wrote:Thanks Keith I am looking something like the solution you suggested.Once I run pg_controldata on the slave data directory it will be give me the latest checkpoint like below. Can you let me know a bit more detail on how can I get the approximate lag in the slave in bytes from this?Latest checkpoint location: 2F9/B38DE758On Mon 18 Jun, 2018, 6:53 PM Keith, <keith@xxxxxxxxxxx> wrote:You can use pg_controldata on any data directory to get its rough position in WAL replay, even when the database isn't runningOn Mon, Jun 18, 2018 at 8:56 AM, Fabio Pardi <f.pardi@xxxxxxxxxxxx> wrote:That I know, if standby is down, you cannot know exactly the last applied transaction log file.
But you could get a gross estimate if you check how many WAL files is the standby behind, just listing the WAL files on the master and on the standby.
Given that you can login on the standby machine.
regards,
fabio pardi
On 18/06/18 14:43, Debraj Manna wrote:
Is it possible to figure out the replication lag from transaction log files in the slave and master without querying postgres on the slave? Basically figuring out the replication lag in a slave when the postgres is down in that slave.