Hello,
Thank you again for the suggestion. I configured tail_n_mail on
my SLES12
machine as follows as everything was ok:
LOG_LINE_PREFIX:
'%t %e
'
EMAIL:
someone@xxxxxxxxxxx
MAILSUBJECT: Acme
HOST Postgres
errors UNIQUE : NUMBER
INCLUDE: ERROR:
terminating
logical replication worker due to timeout
INCLUDE: LOG:
worker
process: logical replication worker for subscription [0-9]+
\({1}PID [0-9]+\){1}
exited with exit code [0-9]+
FILE:
/var/lib/pgsql/pg_log/LATEST
So I configured LOG_LINE_PREFIX
equal to log_prefix_line from postgresql.conf.
So tail_n_mail was able to catch such entries as this one:
2018-11-21
12:41:32 FET
00000
LOG: worker
process: logical replication worker for subscription 16386 (PID
31777)
exited with exit code 1
But after I have updated my
SLES12 to
SP3 tail_n_mail stopped noticing these entry.
2018-11-21
12:41:32 +03
00000
LOG: worker
process: logical replication worker for subscription 16386 (PID
31777)
exited with exit code 1
The reason is that in different output of timezone: FET
was before and +03 is now.
Maybe (due to the upgrade) you are missing some package having to do
with timezone data?
I have two questions in this regard.
I understand that currently tail_n_mail is not completely
perfect detector.
Is there any other robust, reliable way of logical replication
monitoring?
tail_n_mail is the only one I found. In my case it doesn't support :
%c (session id), I filed an issue.
Or, you could easily hack the source to fix this regexp from
\\w\\w\\w\\w? into smth that matches +03 (e.g. \\+\\d\\d) and make
it work.
Is it possible to configure any particular timestamp format in
postgresql
for its logs? As far as I know it can be only turned on or off
depending
on %t in log_prefix_line.
Thank you in advance,
Andrei Yahorau
From:
Andrei Yahorau/IBA
To:
pgsql-admin@xxxxxxxxxxxxxx,
Cc:
Mikalai Keida/IBA@IBA
Date:
24/08/2018 11:49
Subject:
Re: Logical
replication monitoring
Hello,
Thank you for the suggestion.
I increased wal_receiver_timeout
, wal_sender_timeout parameters
and now this error does not occur.
I installed tail_n_mail utility, made a simple config started in
debug
mode.
I am constantly facing the same error:
WARNING! Skipping
non-existent file
"/var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034"
Too many loops (20161): bailing:
The configuration file tail_n_mail.conf is quiet standart:
EMAIL:
someone@xxxxxxxxxxx
PGLOG: log
MAILSUBJECT: Acme HOST Postgres
errors
UNIQUE : NUMBER
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
FILE1:
/var/lib/pgsql/pg_log/postgresql.log-%Y-%m-%d_%H%M%S
LASTFILE1:
/var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034
Could you please say is there
anything
wrong in my configuration or script usage?
Thank you,
Andrei Yahorau
From:
Andrei Yahorau/IBA
To:
pgsql-admin@xxxxxxxxxxxxxx,
Cc:
Mikalai Keida/IBA@IBA
Date:
13/08/2018 13:16
Subject:
Re: Logical
replication monitoring
Hello!
Thank you for your suggestion.
I afraid this approach is not suitable for me. As a rule my
postgresql
log on subscriber side contains a bunch of the following
entries:
ERROR: terminating logical replication worker due to timeout
00000 LOG: worker process:
logical replication worker for subscription 24578 (PID 6217)
exited with
exit code 1
How should I handle this situation?
As I understand this is quite normal situation. But why is
severity
for it an ERROR ?
I have another assumption. Could you correct me if I am wrong.
I found out in the source code that logical replication worker
termination
depends on wal_receiver_timeout paramer.
So I propose setting wal_receiver_timeout to 0.
In this case I think that monitoring of the following views pg_stat_subscription,
pg_publication and pg_stat_replication is
enough.
In case if there is some problem
with connection or with replication pg_stat_replication will
show nothing because wal sender will not be working otherwise it
will give
some information.
Am I right? Are there any vulnerabilities in this approach ?
Best regards,
Andrei Yahorau
From:
Andrei Yahorau/IBA
To:
pgsql-admin@xxxxxxxxxxxxxx,
Cc:
Mikalai Keida/IBA@IBA
Date:
10/08/2018 13:05
Subject:
Logical replication
monitoring
Hello PostgreSQL Community!
I configured logical replication
for
PostgreSQL 10.4 on 2 machines, set wal_level to logical, created
a publication
on master node and created a subscription on standby node
according to
the PostgreSQL documentation.
Could you please suggest an
approach
for replication state monitoring.
According to my experience the
monitoring
of pg_stat_subscription and pg_publication,
pg_replication_slots
unfortunately is not enough for this aim. Moreover standby
database
does not prohibit write operations by default and it can lead to
some inconsistency
between these databases.
For example a chain of queries as
SELECT pg_is_is_recovery(),
SELECT * FROM
pg_stat_replication
and
SELECT * FROM
pg_stat_wal_receiver
provide insight into replication
state
for hot_standby replication.
So is there a reliable way of
replication
state monitoring for logical replication?
Best regards,
Andrei Yahorau
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
|