On 07/25/2018 08:40 AM, Dimitri Maziuk wrote:
On 7/25/2018 10:28 AM, Andres Freund wrote:
Are you really expecting us to be able to reproduce the problem based on
the above description? Our test suites do setup plain replications
setups, and the problem doesn't occur there.
I don't, by definition, have a reproducible case: it only happened once
so far.
Where you using pg_export_snapshot() by any chance?:
https://www.postgresql.org/docs/10/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION
Where there any relevant error messages in the log before the database hung?
If nobody knows what limits the number of files created in
$PGDATA/pg_logical/snapshots then we'll all have to wait until this
happens again.
(To somebody else as I'm obviously not turning logical replication back
on until I know it won't kill my server again.)
Given that it took 3 weeks to manifest itself before, I would say give
it a try and monitor $PGDATA/pg_logical/snapshots. That would help
provide information for getting at the source of the problem. You can
always disable the replication if it looks like it running away.
Dima
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx