Search Postgresql Archives

Re: Logical replication fails when adding multiple replicas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the response Hou,

I've had a look and when the tablesync workers are spinning up there are some errors of the form:

"2023-03-17 18:37:06.900 UTC [4071] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has started"
"2023-03-17 18:37:06.976 UTC [4071] ERROR:  could not create replication slot ""pg_37986_sync_37922_7210774007126708177"": ERROR:  replication slot ""pg_37986_sync_37922_7210774007126708177"" already exists"
"2023-03-17 18:37:06.979 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4071) exited with exit code 1"


However further down a slot for the same table is successfully created:

"2023-03-17 18:38:41.232 UTC [4098] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has started"

And it is not that table that gets stuck:

"2023-03-17 18:40:08.130 UTC [4052] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does not exist"
"2023-03-17 18:40:08.136 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4052) exited with exit code 1"
"2023-03-17 18:40:08.884 UTC [4149] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
"2023-03-17 18:40:08.968 UTC [4149] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does not exist"
"2023-03-17 18:40:08.971 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4149) exited with exit code 1"
"2023-03-17 18:40:08.975 UTC [4150] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has started"

I'm currently working around the issue by waiting until all tables are in 'STATE_READY' or 'STATE_FINISHED' then dropping the subscription, and recreating it with 'copy_data=false' - https://github.com/DemocracyClub/UK-Polling-Stations/blob/aada9def5102f44602dfaca420bb562a8d725870/deploy/files/scripts/setup_db_replication.sh#L46-L48

I've pasted in the complete logs from where I took the examples below, in case you see something I've missed

Cheers,
Will

PS
The fujitsu postgres blog (https://www.postgresql.fastware.com/blog) is great - thank you.

LOGS:

1679078208511,"2023-03-17 18:36:48.409 UTC [3964] LOG:  starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit"
1679078208511,"2023-03-17 18:36:48.409 UTC [3964] LOG:  listening on IPv4 address ""127.0.0.1"", port 5432"
1679078208511,"2023-03-17 18:36:48.409 UTC [3964] LOG:  listening on Unix socket ""/var/run/postgresql/.s.PGSQL.5432"""
1679078208511,2023-03-17 18:36:48.411 UTC [3965] LOG:  database system was shut down at 2023-03-17 18:36:48 UTC
1679078212980,2023-03-17 18:36:48.416 UTC [3964] LOG:  database system is ready to accept connections
1679078215787,"2023-03-17 18:36:55.701 UTC [4021] LOG:  logical replication apply worker for subscription ""polling_stations_0561a02f66363d911"" has started"
1679078215787,"2023-03-17 18:36:55.751 UTC [4022] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onsud"" has started"
1679078216038,"2023-03-17 18:36:55.769 UTC [4023] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_user_groups"" has started"
1679078216038,"2023-03-17 18:36:55.911 UTC [4022] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onsud"" has finished"
1679078216038,"2023-03-17 18:36:55.915 UTC [4027] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""file_uploads_upload"" has started"
1679078216038,"2023-03-17 18:36:55.924 UTC [4023] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_user_groups"" has finished"
1679078216038,"2023-03-17 18:36:55.936 UTC [4028] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_user_user_permissions"" has started"
1679078216289,"2023-03-17 18:36:56.035 UTC [4027] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""file_uploads_upload"" has finished"
1679078216289,"2023-03-17 18:36:56.045 UTC [4029] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""councils_usercouncils"" has started"
1679078216289,"2023-03-17 18:36:56.057 UTC [4028] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_user_user_permissions"" has finished"
1679078216289,"2023-03-17 18:36:56.065 UTC [4030] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""data_finder_loggedpostcode"" has started"
1679078216289,"2023-03-17 18:36:56.159 UTC [4029] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""councils_usercouncils"" has finished"
1679078216289,"2023-03-17 18:36:56.167 UTC [4032] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""data_importers_dataquality"" has started"
1679078216289,"2023-03-17 18:36:56.192 UTC [4030] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""data_finder_loggedpostcode"" has finished"
1679078216540,"2023-03-17 18:36:56.200 UTC [4033] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_address"" has started"
1679078216540,"2023-03-17 18:36:56.294 UTC [4032] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""data_importers_dataquality"" has finished"
1679078216540,"2023-03-17 18:36:56.301 UTC [4035] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_customfinder"" has started"
1679078216540,"2023-03-17 18:36:56.374 UTC [4033] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_address"" has finished"
1679078216540,"2023-03-17 18:36:56.382 UTC [4036] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""file_uploads_file"" has started"
1679078216540,"2023-03-17 18:36:56.412 UTC [4035] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_customfinder"" has finished"
1679078216790,"2023-03-17 18:36:56.420 UTC [4039] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""spatial_ref_sys"" has started"
1679078216790,"2023-03-17 18:36:56.546 UTC [4036] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""file_uploads_file"" has finished"
1679078216790,"2023-03-17 18:36:56.551 UTC [4041] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
1679078216790,"2023-03-17 18:36:56.604 UTC [4041] ERROR:  could not create replication slot ""pg_37986_sync_37706_7210774007126708177"": ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" already exists"
1679078216790,"2023-03-17 18:36:56.608 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4041) exited with exit code 1"
1679078216790,"2023-03-17 18:36:56.616 UTC [4042] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_group_permissions"" has started"
1679078216790,"2023-03-17 18:36:56.704 UTC [4042] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_group_permissions"" has finished"
1679078217041,"2023-03-17 18:36:56.717 UTC [4044] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_migrations"" has started"
1679078217041,"2023-03-17 18:36:56.812 UTC [4039] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""spatial_ref_sys"" has finished"
1679078217041,"2023-03-17 18:36:56.816 UTC [4047] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_pollingstation"" has started"
1679078217041,"2023-03-17 18:36:56.825 UTC [4044] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_migrations"" has finished"
1679078217041,"2023-03-17 18:36:56.841 UTC [4048] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_admin_log"" has started"
1679078217041,"2023-03-17 18:36:56.970 UTC [4048] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_admin_log"" has finished"
1679078217292,"2023-03-17 18:36:56.978 UTC [4052] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has started"
1679078217292,"2023-03-17 18:36:57.173 UTC [4047] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_pollingstation"" has finished"
1679078217542,"2023-03-17 18:36:57.183 UTC [4056] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_session"" has started"
1679078217542,"2023-03-17 18:36:57.323 UTC [4056] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_session"" has finished"
1679078217792,"2023-03-17 18:36:57.356 UTC [4057] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_pollingdistrict"" has started"
1679078217792,"2023-03-17 18:36:57.587 UTC [4057] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_pollingdistrict"" has finished"
1679078218043,"2023-03-17 18:36:57.661 UTC [4058] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_user"" has started"
1679078218043,"2023-03-17 18:36:57.821 UTC [4058] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_user"" has finished"
1679078218043,"2023-03-17 18:36:57.835 UTC [4059] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""councils_council"" has started"
1679078218293,"2023-03-17 18:36:58.021 UTC [4059] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""councils_council"" has finished"
1679078222980,"2023-03-17 18:36:58.112 UTC [4061] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""councils_councilgeography"" has started"
1679078224057,"2023-03-17 18:37:03.822 UTC [4061] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""councils_councilgeography"" has finished"
1679078226313,"2023-03-17 18:37:03.834 UTC [4065] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""feedback_feedback"" has started"
1679078226313,"2023-03-17 18:37:06.177 UTC [4065] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""feedback_feedback"" has finished"
1679078226563,"2023-03-17 18:37:06.184 UTC [4066] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_site"" has started"
1679078226563,"2023-03-17 18:37:06.322 UTC [4066] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_site"" has finished"
1679078226563,"2023-03-17 18:37:06.328 UTC [4067] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_content_type"" has started"
1679078226563,"2023-03-17 18:37:06.462 UTC [4067] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""django_content_type"" has finished"
1679078226814,"2023-03-17 18:37:06.467 UTC [4068] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_permission"" has started"
1679078226814,"2023-03-17 18:37:06.590 UTC [4068] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_permission"" has finished"
1679078226814,"2023-03-17 18:37:06.596 UTC [4069] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""authtoken_token"" has started"
1679078226814,"2023-03-17 18:37:06.741 UTC [4069] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""authtoken_token"" has finished"
1679078227065,"2023-03-17 18:37:06.747 UTC [4070] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_group"" has started"
1679078227065,"2023-03-17 18:37:06.892 UTC [4070] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""auth_group"" has finished"
1679078227065,"2023-03-17 18:37:06.900 UTC [4071] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has started"
1679078227065,"2023-03-17 18:37:06.976 UTC [4071] ERROR:  could not create replication slot ""pg_37986_sync_37922_7210774007126708177"": ERROR:  replication slot ""pg_37986_sync_37922_7210774007126708177"" already exists"
1679078227065,"2023-03-17 18:37:06.979 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4071) exited with exit code 1"
1679078227316,"2023-03-17 18:37:06.983 UTC [4072] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_advancevotingstation"" has started"
1679078227316,"2023-03-17 18:37:07.196 UTC [4072] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""pollingstations_advancevotingstation"" has finished"
1679078227566,"2023-03-17 18:37:07.201 UTC [4073] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""bug_reports_bugreport"" has started"
1679078227566,"2023-03-17 18:37:07.370 UTC [4073] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""bug_reports_bugreport"" has finished"
1679078231981,"2023-03-17 18:37:07.377 UTC [4074] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
1679078238098,2023-03-17 18:37:17.998 UTC [3966] LOG:  checkpoints are occurring too frequently (27 seconds apart)
1679078242980,"2023-03-17 18:37:17.998 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078261904,2023-03-17 18:37:41.895 UTC [3966] LOG:  checkpoints are occurring too frequently (24 seconds apart)
1679078265980,"2023-03-17 18:37:41.895 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078284964,2023-03-17 18:38:04.796 UTC [3966] LOG:  checkpoints are occurring too frequently (23 seconds apart)
1679078288981,"2023-03-17 18:38:04.796 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078306522,2023-03-17 18:38:26.427 UTC [3966] LOG:  checkpoints are occurring too frequently (22 seconds apart)
1679078310980,"2023-03-17 18:38:26.427 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078320313,"2023-03-17 18:38:40.216 UTC [4074] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does not exist"
1679078321316,"2023-03-17 18:38:40.222 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4074) exited with exit code 1"
1679078325075,"2023-03-17 18:38:41.232 UTC [4098] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has started"
1679078325075,2023-03-17 18:38:44.830 UTC [3966] LOG:  checkpoints are occurring too frequently (18 seconds apart)
1679078329981,"2023-03-17 18:38:44.830 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078341118,2023-03-17 18:39:01.053 UTC [3966] LOG:  checkpoints are occurring too frequently (17 seconds apart)
1679078345981,"2023-03-17 18:39:01.053 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078355665,2023-03-17 18:39:15.522 UTC [3966] LOG:  checkpoints are occurring too frequently (14 seconds apart)
1679078359981,"2023-03-17 18:39:15.522 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078369702,2023-03-17 18:39:29.501 UTC [3966] LOG:  checkpoints are occurring too frequently (14 seconds apart)
1679078373981,"2023-03-17 18:39:29.501 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078384001,2023-03-17 18:39:43.960 UTC [3966] LOG:  checkpoints are occurring too frequently (14 seconds apart)
1679078388980,"2023-03-17 18:39:43.960 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078398285,2023-03-17 18:39:58.227 UTC [3966] LOG:  checkpoints are occurring too frequently (15 seconds apart)
1679078402981,"2023-03-17 18:39:58.227 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078408316,"2023-03-17 18:40:08.130 UTC [4052] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does not exist"
1679078409068,"2023-03-17 18:40:08.136 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4052) exited with exit code 1"
1679078409069,"2023-03-17 18:40:08.884 UTC [4149] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
1679078409069,"2023-03-17 18:40:08.968 UTC [4149] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does not exist"
1679078409069,"2023-03-17 18:40:08.971 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4149) exited with exit code 1"
1679078409069,"2023-03-17 18:40:08.975 UTC [4150] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has started"
1679078409069,"2023-03-17 18:40:09.040 UTC [4150] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does not exist"
1679078413980,"2023-03-17 18:40:09.043 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4150) exited with exit code 1"
1679078414331,"2023-03-17 18:40:14.053 UTC [4154] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
1679078414331,"2023-03-17 18:40:14.108 UTC [4154] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does not exist"
1679078414331,"2023-03-17 18:40:14.111 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4154) exited with exit code 1"
1679078414331,"2023-03-17 18:40:14.115 UTC [4155] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has started"
1679078414331,"2023-03-17 18:40:14.168 UTC [4155] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does not exist"
1679078415586,"2023-03-17 18:40:14.171 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4155) exited with exit code 1"
1679078415586,2023-03-17 18:40:15.511 UTC [3966] LOG:  checkpoints are occurring too frequently (17 seconds apart)
1679078419982,"2023-03-17 18:40:15.511 UTC [3966] HINT:  Consider increasing the configuration parameter ""max_wal_size""."
1679078420098,"2023-03-17 18:40:19.930 UTC [4159] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
1679078420098,"2023-03-17 18:40:19.998 UTC [4159] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does not exist"
1679078420098,"2023-03-17 18:40:20.001 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4159) exited with exit code 1"
1679078420349,"2023-03-17 18:40:20.004 UTC [4160] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has started"
1679078420349,"2023-03-17 18:40:20.138 UTC [4160] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does not exist"
1679078424980,"2023-03-17 18:40:20.141 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4160) exited with exit code 1"
1679078425616,"2023-03-17 18:40:25.400 UTC [4164] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
1679078425616,"2023-03-17 18:40:25.481 UTC [4164] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does not exist"
1679078425616,"2023-03-17 18:40:25.484 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4164) exited with exit code 1"
1679078425616,"2023-03-17 18:40:25.487 UTC [4165] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has started"
1679078425616,"2023-03-17 18:40:25.563 UTC [4165] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does not exist"
1679078429981,"2023-03-17 18:40:25.566 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4165) exited with exit code 1"
1679078431133,"2023-03-17 18:40:31.093 UTC [4169] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil"" has started"
1679078431383,"2023-03-17 18:40:31.130 UTC [4169] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does not exist"
1679078431383,"2023-03-17 18:40:31.134 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4169) exited with exit code 1"
1679078431383,"2023-03-17 18:40:31.138 UTC [4170] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has started"
1679078431383,"2023-03-17 18:40:31.168 UTC [4170] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does not exist"
1679078434391,"2023-03-17 18:40:31.171 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4170) exited with exit code 1"
1679078434641,"2023-03-17 18:40:34.385 UTC [4098] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37922_7210774007126708177"" does not exist"
1679078435644,"2023-03-17 18:40:34.409 UTC [3964] LOG:  background worker ""logical replication worker"" (PID 4098) exited with exit code 1"
1679078435644,"2023-03-17 18:40:35.413 UTC [4173] LOG:  logical replication table synchronization worker for subscription ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has started"
1679078435644,"2023-03-17 18:40:35.448 UTC [4173] ERROR:  could not start WAL streaming: ERROR:  replication slot ""pg_37986_sync_37922_7210774007126708177"" does not exist"


On Wed, Mar 22, 2023 at 1:49 AM houzj.fnst@xxxxxxxxxxx <houzj.fnst@xxxxxxxxxxx> wrote:
On Monday, March 20, 2023 8:46 PM Will Roper <will.roper@xxxxxxxxxxxxxxxxxxxx>  wrote:

Hi,

> We’re having some issues with Postgresql’s logical replication. Specifically
> trying to add several replicas at once. Essentially we can add replicas one at
> a time, but when we try and add two or more together some of the table
> subscriptions (as described in pg_subscription_rel) fail to get to
> ‘STATE_READY’ (‘r’). Here’s a bit more detail.
>
> How the problem manifests/what we observe When we try to add several replicas
> at the same time most of them have at least some tables/subscriptions (in the
> pg_subscription_rel) that get stuck in STATE_FINISHEDCOPY and never reach
> STATE_READY. The tables have the right number of rows, but the logs show
> postgres is in a constant cycle of errors like:
>
> 2023-03-18 16:00:19.974 UTC [4207] LOG:  logical replication table synchronization worker for subscription "polling_stations_01c7ee00f17021f94", table "uk_geo_utils_onspd" has started
> 2023-03-18 16:00:19.999 UTC [4207] ERROR:  could not start WAL streaming: ERROR:  replication slot "pg_37982_sync_37918_7210774007126708177" does not exist

Are there some other kind of ERROR messages on subscriber before the first "slot ... does
not exist" ? I think it's possible that some other ERROR happened when first time
setting the state to READY, and then the table sync keeps retrying and failing.

Best Regards,
Hou zj

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux