Hi Adrian,
thank you very much for your patience. I apologise for the missing information.
On 9 March 2016 16:13:00 +01:00, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 03/09/2016 04:56 AM, fredrik@xxxxxxxxxxxxx wrote:Hi Adrian,thank you very much for your response.I ran the "VACUUM ANALYZE" command on the master node.Regarding log messages.Here is the contents of the log (excluding connections/disconnections):Assuming the below is from the replica database.
the "LOG: recovery was paused" message was indeed from the replica.
2016-02-22 02:30:08 GMT 24616 LOG: recovery has pausedSo what happened to cause the above?
we automatically pause recovery on the replica before running pg_dump. This is in order to make certain that we get a consistent dump of the database.
I am not seeing anything below that indicates the recovery started again.
the reason why we do not see a matching "resume" is that the pg_dump failed and our error handling was insufficient.
2016-02-22 02:30:08 GMT 24616 HINT: Execute pg_xlog_replay_resume() tocontinue.2016-02-22 02:37:19 GMT 23859 DBNAME ERROR: missing chunk number 0 fortoast value 2747579 in pg_toast_220662016-02-22 02:37:19 GMT 23859 DBNAME STATEMENT: COPY public.room_shape(room_uuid, data) TO stdout;2016-02-22 02:37:41 GMT 2648 DBNAME LOG: could not receive data fromclient: Connection reset by peer2016-02-22 02:37:41 GMT 2648 DBNAME LOG: unexpected EOF on clientconnectionWhat does the log from the master show?
It doesnt seem to show much. It does have these repeated messages, however:
2016-02-22 02:12:18 GMT 30908 LOG: using stale statistics instead of current ones because stats collector is not responding
2016-02-22 02:13:01 GMT 30908 LOG: using stale statistics instead of current ones because stats collector is not responding
2016-02-22 02:13:52 GMT 30908 LOG: using stale statistics instead of current ones because stats collector is not responding
There are lots of these mesages within the timeframe. There seems to be a couple of them every 2-4 hours.
Best regards,Fredrik HuitfeldtOn 7 March 2016 16:35:29 +01:00, Adrian Klaver<adrian.klaver@xxxxxxxxxxx> wrote:On 03/06/2016 10:18 PM, fredrik@xxxxxxxxxxxxx<mailto:fredrik@xxxxxxxxxxxxx> wrote:HI All,i would really appreciate any help I can get on this issue.basically, a pg_basebackup + streaming attach, led to a databasethat wecould not read from afterwards.From original post:"The issue remained until we ran a full vacuum analyze on the cluster."Which cluster was that, the master or the slave?"I have logfiles from the incident, but I cannot see anything out ofthe ordinary (despite having a fair amount of experience investigatingpostgresql logs)."Can we see the section before and after ERROR?Beset regards,FredrikPS please advise if this is better posted on another list.--Adrian Klaver--Adrian Klaver
Best regards,
Fredrik