Search Postgresql Archives

BDR: SSL error: bad write retry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



BDR-0.9.3 and PG-9.4.4 on Ubuntu 14.04

Two nodes, BDR replication. Cluster is newly created, no nodes have been removed from it. Making/deleting small tables works well across the cluster.

Now I'm trying to pg_restore a larger database from another system (pg_dump output file is 3.1 GB compressed). I am running pg_restore on one node in the cluster, and hoping that BDR would replicate changes to the other node.

It went well through creating schemas, extensions, comments, views, functions, sequences and tables. It imported data into tables. Now it's trying to create indexes - but it takes a very long time and I see errors in the logs.

I see periodic data transfers between instances. CPU usage shoots up, then back down. There's a cycle of a few minutes where these things occur.

On the instance where I'm running pg_restore, I see this in the logs:

###############################################
2015-11-03 21:45:35.328 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,5,"idle",2015-11-03 21:41:58 UTC,5/146,0,LOG,08P01,"SSL error: bad write retry",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"", output plugin ""bdr"", in the change callback, associated LSN 0/2B29E50",,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:35.328 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,6,"idle",2015-11-03 21:41:58 UTC,5/146,0,LOG,08006,"could not receive data from client: Connection reset by peer",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"", output plugin ""bdr"", in the change callback, associated LSN 0/2B29E50",,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:35.328 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,7,"idle",2015-11-03 21:41:58 UTC,5/146,0,LOG,08P01,"unexpected EOF on standby connection",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"", output plugin ""bdr"", in the change callback, associated LSN 0/2B29E50",,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:35.330 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,8,"idle",2015-11-03 21:41:58 UTC,,0,LOG,00000,"disconnection: session time: 0:03:37.234 user=postgres database=pgmirror host=10.1.1.169 port=36334",,,,,,,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:40.421 UTC,,,5066,"",56392b04.13ca,1,"",2015-11-03 21:45:40 UTC,,0,LOG,00000,"connection received: host=10.1.1.169 port=36335",,,,,,,,,"" 2015-11-03 21:45:40.427 UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,2,"authentication",2015-11-03 21:45:40 UTC,5/149,0,LOG,00000,"replication connection authorized: user=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=off)",,,,,,,,,"" 2015-11-03 21:45:40.431 UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,3,"idle",2015-11-03 21:45:40 UTC,5/0,0,LOG,00000,"starting logical decoding for slot ""bdr_16387_6212727469166484615_1_16387__""","streaming transactions committing after 1/14FEDFE0, reading WAL from 0/187AFA0",,,,,,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:40.435 UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,4,"idle",2015-11-03 21:45:40 UTC,5/0,0,LOG,00000,"logical decoding found consistent point at 0/187AFA0","Logical decoding will begin using saved snapshot.",,,,,,,,"bdr (6212727469166484615,1,16387,):receive"
###############################################

On the other node in the cluster (supposed to receive the data via BDR) I see this in the logs:

###############################################
2015-11-03 21:45:31.885 UTC,,,3391,,56391e57.d3f,19,,2015-11-03 20:51:35 UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,"" 2015-11-03 21:45:35.400 UTC,,,4773,,56392a26.12a5,1,,2015-11-03 21:41:58 UTC,3/201,876,ERROR,XX000,"connection to other side has died",,,,,,,,,"bdr (6212727469166484615,1,16387,): apply" 2015-11-03 21:45:35.408 UTC,,,3388,,56391e56.d3c,34,,2015-11-03 20:51:34 UTC,,0,LOG,00000,"worker process: bdr (6212727469166484615,1,16387,)->bdr (6212727476664052871,1, (PID 4773) exited with exit code 1",,,,,,,,,"" 2015-11-03 21:45:40.412 UTC,,,3388,,56391e56.d3c,35,,2015-11-03 20:51:34 UTC,,0,LOG,00000,"starting background worker process ""bdr (6212727469166484615,1,16387,)->bdr (6212727476664052871,1,""",,,,,,,,,""
###############################################


--
Florin Andrei
http://florin.myip.org/


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux