Hello,
Thank you again for the suggestion. I configured tail_n_mail on my SLES12
machine as follows as everything was ok:
LOG_LINE_PREFIX: '%t %e
'
EMAIL: someone@xxxxxxxxxxx
MAILSUBJECT: Acme HOST Postgres
errors UNIQUE : NUMBER
INCLUDE: ERROR: terminating
logical replication worker due to timeout
INCLUDE: LOG: worker
process: logical replication worker for subscription [0-9]+ \({1}PID [0-9]+\){1}
exited with exit code [0-9]+
FILE: /var/lib/pgsql/pg_log/LATEST
So I configured LOG_LINE_PREFIX
equal to log_prefix_line from postgresql.conf.
So tail_n_mail was able to catch such entries as this one:
2018-11-21 12:41:32 FET
00000 LOG: worker
process: logical replication worker for subscription 16386 (PID 31777)
exited with exit code 1
But after I have updated my SLES12 to
SP3 tail_n_mail stopped noticing these entry.
2018-11-21 12:41:32 +03
00000 LOG: worker
process: logical replication worker for subscription 16386 (PID 31777)
exited with exit code 1
The reason is that in different output of timezone: FET
was before and +03 is now.
I have two questions in this regard.
I understand that currently tail_n_mail is not completely perfect detector.
Is there any other robust, reliable way of logical replication monitoring?
Is it possible to configure any particular timestamp format in postgresql
for its logs? As far as I know it can be only turned on or off depending
on %t in log_prefix_line.
Thank you in advance,
Andrei Yahorau
From:
Andrei Yahorau/IBA
To:
pgsql-admin@xxxxxxxxxxxxxx,
Cc:
Mikalai Keida/IBA@IBA
Date:
24/08/2018 11:49
Subject:
Re: Logical
replication monitoring
Hello,
Thank you for the suggestion.
I increased wal_receiver_timeout
, wal_sender_timeout parameters
and now this error does not occur.
I installed tail_n_mail utility, made a simple config started in debug
mode.
I am constantly facing the same error:
WARNING! Skipping non-existent file
"/var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034"
Too many loops (20161): bailing:
The configuration file tail_n_mail.conf is quiet standart:
EMAIL: someone@xxxxxxxxxxx
PGLOG: log
MAILSUBJECT: Acme HOST Postgres errors
UNIQUE : NUMBER
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
FILE1: /var/lib/pgsql/pg_log/postgresql.log-%Y-%m-%d_%H%M%S
LASTFILE1: /var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034
Could you please say is there anything
wrong in my configuration or script usage?
Thank you,
Andrei Yahorau
From:
Andrei Yahorau/IBA
To:
pgsql-admin@xxxxxxxxxxxxxx,
Cc:
Mikalai Keida/IBA@IBA
Date:
13/08/2018 13:16
Subject:
Re: Logical
replication monitoring
Hello!
Thank you for your suggestion.
I afraid this approach is not suitable for me. As a rule my postgresql
log on subscriber side contains a bunch of the following entries:
ERROR: terminating logical replication worker due to timeout
00000 LOG: worker process:
logical replication worker for subscription 24578 (PID 6217) exited with
exit code 1
How should I handle this situation?
As I understand this is quite normal situation. But why is severity
for it an ERROR ?
I have another assumption. Could you correct me if I am wrong.
I found out in the source code that logical replication worker termination
depends on wal_receiver_timeout paramer.
So I propose setting wal_receiver_timeout to 0.
In this case I think that monitoring of the following views pg_stat_subscription,
pg_publication and pg_stat_replication is enough.
In case if there is some problem
with connection or with replication pg_stat_replication will
show nothing because wal sender will not be working otherwise it will give
some information.
Am I right? Are there any vulnerabilities in this approach ?
Best regards,
Andrei Yahorau
From:
Andrei Yahorau/IBA
To:
pgsql-admin@xxxxxxxxxxxxxx,
Cc:
Mikalai Keida/IBA@IBA
Date:
10/08/2018 13:05
Subject:
Logical replication
monitoring
Hello PostgreSQL Community!
I configured logical replication for
PostgreSQL 10.4 on 2 machines, set wal_level to logical, created a publication
on master node and created a subscription on standby node according to
the PostgreSQL documentation.
Could you please suggest an approach
for replication state monitoring.
According to my experience the monitoring
of pg_stat_subscription and pg_publication, pg_replication_slots
unfortunately is not enough for this aim. Moreover standby database
does not prohibit write operations by default and it can lead to some inconsistency
between these databases.
For example a chain of queries as
SELECT pg_is_is_recovery(),
SELECT * FROM pg_stat_replication
and
SELECT * FROM pg_stat_wal_receiver
provide insight into replication state
for hot_standby replication.
So is there a reliable way of replication
state monitoring for logical replication?
Best regards,
Andrei Yahorau