I assume it would be related to the following: LOG: incorrect resource manager data checksum in record at 2D6/C259AB90 since the walreceiver terminates just after this - but I'm unclear what precisely this means. Without digging into the code, I would guess that it's unable to verify the checksum on the segment it just received from the master; however, there are multiple replicas here, so it points to an issue on this client. However, it happens everywhere -- we have ~16 replicas across 3 different clusters (on different versions) and we see this uniformly across them all at seemingly random times. Also, just to clarify, this will only happen on a single replica at a time. On Thu, Apr 23, 2020 at 2:46 PM Justin King <kingpin867@xxxxxxxxx> wrote: > > On Thu, Apr 23, 2020 at 12:47 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > > > > Justin King <kingpin867@xxxxxxxxx> writes: > > > We've seen unexpected termination of the WAL receiver process. This > > > stops streaming replication, but the replica stays available -- > > > restarting the server resumes streaming replication where it left off. > > > We've seen this across nearly every recent version of PG, (9.4, 9.5, > > > 11.x, 12.x) -- anything omitted is one we haven't used. > > > > > I don't have an explanation for the cause, but I was able to set > > > logging to "debug5" and run an strace of the walrecevier PID when it > > > eventually happened. It appears as if the SIGTERM is coming from the > > > "postgres: startup" process. > > > > The startup process intentionally SIGTERMs the walreceiver under > > various circumstances, so I'm not sure that there's any surprise > > here. Have you checked the postmaster log? > > > > regards, tom lane > > Yep, I included "debug5" output of the postmaster log in the initial post.