We've seen unexpected termination of the WAL receiver process. This stops streaming replication, but the replica stays available -- restarting the server resumes streaming replication where it left off. We've seen this across nearly every recent version of PG, (9.4, 9.5, 11.x, 12.x) -- anything omitted is one we haven't used. I don't have an explanation for the cause, but I was able to set logging to "debug5" and run an strace of the walrecevier PID when it eventually happened. It appears as if the SIGTERM is coming from the "postgres: startup" process. The only notable thing that I've seen discussion of (in the past) and where it might be relevant here is that these replicas all reside on a ZFS filesystem (on Linux). In fact, everywhere we've seen issues, the database sits on a ZFS pool. We also have another PG cluster running on non-ZFS (ext4) and have never experienced this problem. Any suggestions? Logging snippets are below: postgresql.log Apr 23 09:09:13 royce postgres[2229]: [43141904-1] 2020-04-23 09:09:13.663 GMT [2229] DEBUG: record known xact 8677704 latestObservedXid 8677704 Apr 23 09:09:13 royce postgres[2229]: [43141904-2] 2020-04-23 09:09:13.663 GMT [2229] CONTEXT: WAL redo at 2D6/C259A6E0 for Btree/INSERT_LEAF: off 88 Apr 23 09:09:13 royce postgres[2229]: [43141905-1] 2020-04-23 09:09:13.663 GMT [2229] DEBUG: record known xact 8677704 latestObservedXid 8677704 Apr 23 09:09:13 royce postgres[2229]: [43141905-2] 2020-04-23 09:09:13.663 GMT [2229] CONTEXT: WAL redo at 2D6/C259A720 for Heap/HEAP_CONFIRM: off 6 Apr 23 09:09:13 royce postgres[2229]: [43141906-1] 2020-04-23 09:09:13.663 GMT [2229] DEBUG: record known xact 8677704 latestObservedXid 8677704 Apr 23 09:09:13 royce postgres[2229]: [43141906-2] 2020-04-23 09:09:13.663 GMT [2229] CONTEXT: WAL redo at 2D6/C259A750 for Heap/INSERT: off 7 flags 0x0C Apr 23 09:09:13 royce postgres[2229]: [43141907-1] 2020-04-23 09:09:13.663 GMT [2229] LOG: incorrect resource manager data checksum in record at 2D6/C259AB90 Apr 23 09:09:13 royce postgres[2299]: [1448555-1] 2020-04-23 09:09:13.663 GMT [2299] FATAL: terminating walreceiver process due to administrator command Apr 23 09:09:13 royce postgres[2299]: [1448556-1] 2020-04-23 09:09:13.663 GMT [2299] DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make Apr 23 09:09:13 royce postgres[2299]: [1448557-1] 2020-04-23 09:09:13.663 GMT [2299] DEBUG: shmem_exit(1): 6 on_shmem_exit callbacks to make Apr 23 09:09:13 royce postgres[2229]: [43141908-1] 2020-04-23 09:09:13.664 GMT [2229] DEBUG: switched WAL source from stream to archive after failure Apr 23 09:09:13 royce postgres[2229]: [43141909-1] 2020-04-23 09:09:13.664 GMT [2229] LOG: incorrect resource manager data checksum in record at 2D6/C259AB90 Apr 23 09:09:13 royce postgres[2299]: [1448558-1] 2020-04-23 09:09:13.664 GMT [2299] DEBUG: proc_exit(1): 2 callbacks to make Apr 23 09:09:13 royce postgres[2229]: [43141910-1] 2020-04-23 09:09:13.664 GMT [2229] DEBUG: switched WAL source from archive to stream after failure Apr 23 09:09:13 royce postgres[2229]: [43141911-1] 2020-04-23 09:09:13.664 GMT [2229] LOG: incorrect resource manager data checksum in record at 2D6/C259AB90 Apr 23 09:09:13 royce postgres[2299]: [1448559-1] 2020-04-23 09:09:13.664 GMT [2299] DEBUG: exit(1) Apr 23 09:09:13 royce postgres[2229]: [43141912-1] 2020-04-23 09:09:13.664 GMT [2229] DEBUG: switched WAL source from stream to archive after failure Apr 23 09:09:13 royce postgres[2229]: [43141913-1] 2020-04-23 09:09:13.664 GMT [2229] DEBUG: incorrect resource manager data checksum in record at 2D6/C259AB90 Apr 23 09:09:13 royce postgres[2229]: [43141914-1] 2020-04-23 09:09:13.664 GMT [2229] DEBUG: switched WAL source from archive to stream after failure Apr 23 09:09:13 royce postgres[2299]: [1448560-1] 2020-04-23 09:09:13.664 GMT [2299] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make Apr 23 09:09:13 royce postgres[2229]: [43141915-1] 2020-04-23 09:09:13.664 GMT [2229] LOG: incorrect resource manager data checksum in record at 2D process id list: 2224 ? Ss 0:24 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c unix_socket_directories=/var/run/postgresql -c config_file=/etc/postgresql/12/main/postgresql.conf -c hba_file=/etc/postgresql/12/main/pg_hba.conf -c ident_file=/etc/postgresql/12/main/pg_ident.conf -c external_pid_file=/var/run/postgresql/12-main.pid 2229 ? Ss 7:02 postgres: startup 2230 ? Ss 6:09 postgres: checkpointer 2231 ? Ss 0:08 postgres: background writer 2298 ? Ss 1:29 postgres: stats collector 2299 ? Ss 6:11 postgres: walreceiver strace of walreceiver: kill(2229, SIGUSR1) = 0 getpid() = 2299 sendto(3<socket:[208548671]>, "<135>Apr 23 09:09:13 postgres[2299]: [1448554-1] 2020-04-23 09:09:13.663 GMT [2299] DEBUG: sending write 2D6/C25A0000 flush 2D6/C25A0000 apply 2D6/C25915F0", 156, MSG_NOSIGNAL, NULL, 0) = 156 sendto(4<socket:[208548975]>, "\x64\x00\x00\x00\x26\x72\x00\x00\x02\xd6\xc2\x5a\x00\x00\x00\x00\x02\xd6\xc2\x5a\x00\x00\x00\x00\x02\xd6\xc2\x59\x15\xf0\x00\x02\x46\xf0\x5f\x04\xbe\x9e\x00", 39, MSG_NOSIGNAL, NULL, 0) = 39 epoll_create1(EPOLL_CLOEXEC) = 6<anon_inode:[eventpoll]> epoll_ctl(6<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 10<pipe:[208548970]>, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=347640776, u64=94863290307528}}) = 0 epoll_ctl(6<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 7<pipe:[208548681]>, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=347640800, u64=94863290307552}}) = 0 epoll_ctl(6<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 4<socket:[208548975]>, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=347640824, u64=94863290307576}}) = 0 epoll_wait(6<anon_inode:[eventpoll]>, 0x564714b89410, 1, 100) = -1 EINTR (Interrupted system call) --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2229, si_uid=112} --- write(11<pipe:[208548970]>, "\x00", 1) = 1 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) close(6<anon_inode:[eventpoll]>) = 0 getpid() = 2299 sendto(3<socket:[208548671]>, "<131>Apr 23 09:09:13 postgres[2299]: [1448555-1] 2020-04-23 09:09:13.663 GMT [2299] FATAL: terminating walreceiver process due to administrator command", 152, MSG_NOSIGNAL, NULL, 0) = 152 getpid() = 2299 sendto(3<socket:[208548671]>, "<135>Apr 23 09:09:13 postgres[2299]: [1448556-1] 2020-04-23 09:09:13.663 GMT [2299] DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make", 144, MSG_NOSIGNAL, NULL, 0) = 144 getpid() = 2299 sendto(3<socket:[208548671]>, "<135>Apr 23 09:09:13 postgres[2299]: [1448557-1] 2020-04-23 09:09:13.663 GMT [2299] DEBUG: shmem_exit(1): 6 on_shmem_exit callbacks to make", 140, MSG_NOSIGNAL, NULL, 0) = 140 sendto(4<socket:[208548975]>, "\x58\x00\x00\x00\x04", 5, MSG_NOSIGNAL, NULL, 0) = 5 Any suggestions? Thanks- Justin