FATAL: terminating connection due to administrator command

Alban Hertroys <alban.hertroys@xxxxxxxxxxxxxxxxxxxx> · Thu, 1 Oct 2020 11:17:17 +0200

Hi all,

We're seeing the FATAL error message
from the subject pop up in our logs at regular intervals, but I haven't
been able to pinpoint what is causing it. I'm hoping for some insights
here.

We run a PostgreSQL 11.9 server on CentOS
7, within a vmware environment:

 PostgreSQL 11.9 on x86_64-pc-linux-gnu,
compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

The package was installed from the PGDG
repository.

I'm not even sure I should be worried,
there doesn't appear to be any impact on the servers' functioning, but
it does say 'FATAL'.

What we're seeing are lines like these
two instances:

2020-09-30 21:45:09.999 CEST [2375]
172.30.2.25 asdf STATEMENT:  select count(*) from "util_asdf"."v_something_something2"

2020-09-30 21:45:15.018 CEST [2375]
172.30.2.25 asdf ERROR:  canceling statement due to user request

2020-09-30 21:45:15.018 CEST [2375]
172.30.2.25 asdf STATEMENT:  select count(*) from "dm_asdf"."asdf_func"

2020-09-30 21:45:20.027 CEST [2375]
172.30.2.25 asdf ERROR:  canceling statement due to user request

2020-09-30 21:45:20.027 CEST [2375]
172.30.2.25 asdf STATEMENT:  select count(*) from "dm_asdf"."asdf_func"

2020-09-30 22:27:51.422 CEST [20270]
172.30.2.26 selfservice_sales ERROR:  canceling statement due to user
request

2020-09-30 22:27:51.422 CEST [20270]
172.30.2.26 selfservice_sales STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:51.422 CEST [30649]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:51.422 CEST [30649]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:51.422 CEST [30648]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:51.422 CEST [30648]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:51.422 CEST [30647]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:51.422 CEST [30647]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:51.422 CEST [30646]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:51.422 CEST [30646]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:51.422 CEST [30645]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:51.422 CEST [30645]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:51.435 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30646)
exited with exit code 1

2020-09-30 22:27:51.435 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30645)
exited with exit code 1

2020-09-30 22:27:51.435 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30647)
exited with exit code 1

2020-09-30 22:27:51.435 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30648)
exited with exit code 1

2020-09-30 22:27:51.435 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30649)
exited with exit code 1

2020-09-30 22:27:56.446 CEST [20270]
172.30.2.26 selfservice_sales ERROR:  canceling statement due to user
request

2020-09-30 22:27:56.446 CEST [20270]
172.30.2.26 selfservice_sales STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:56.446 CEST [30659]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:56.446 CEST [30659]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:56.446 CEST [30658]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:56.446 CEST [30658]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:56.446 CEST [30657]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:56.446 CEST [30657]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:56.446 CEST [30656]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:56.446 CEST [30656]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:56.446 CEST [30655]
  FATAL:  terminating connection due to administrator command

2020-09-30 22:27:56.446 CEST [30655]
  STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"

2020-09-30 22:27:56.459 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30655)
exited with exit code 1

2020-09-30 22:27:56.459 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30656)
exited with exit code 1

2020-09-30 22:27:56.459 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30657)
exited with exit code 1

2020-09-30 22:27:56.459 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30658)
exited with exit code 1

2020-09-30 22:27:56.459 CEST [6482]
  LOG:  background worker "parallel worker" (PID 30659)
exited with exit code 1

2020-09-30 22:43:08.459 CEST [8055]
172.30.2.25 selfservice_prd ERROR:  schema "somethingelse"
does not exist at character 71

Apparently, something is sending SIGTERM
to our pg processes. I know that I'm not doing that, certainly not at those
hours, and I'm the one who set up this system and am the only DBA of it.

Advice I found on the Internet is to
use systemtap with some tap-script, but the scripts that I found just displayed
the PID's of processes without telling me their names, which I didn't find
all that useful in figuring out who was responsible, so I made an attempt
(I have no experience with stap) at modifying it to print process names
of signal sender and target:

/*

 * killsnoop-nd.stp    
Trace process signals.

 *        
             For Linux, uses SystemTap
(non-debuginfo).

 *

 * Copyright (C) 2015 Brendan Gregg.

(etc)

 */

global target;

global signal;

probe begin

{

        printf("%-6s
%-12s %-5s %-6s %6s\n", "FROM", "COMMAND", "SIG",
"TO", "COMMAND");

}

probe nd_syscall.kill

{

        target[tid()]
= uint_arg(1);

        signal[tid()]
= uint_arg(2);

}

probe nd_syscall.kill.return

{

        if (signal[tid()]
== 15 && target[tid()] != 0) {

    printf("%-6d %-12s %-5d %-6d %12s\n"

                    ,
pid(), execname()

                    ,
signal[tid()]

                    ,
target[tid()], pid2execname(target[tid()]));

        }

        delete target[tid()];

        delete signal[tid()];

}

The output of last night was:

FROM   COMMAND      SIG
  TO     COMMAND

30068  systemd-udevd 15  
 14151  systemd-udevd

30068  systemd-udevd 15  
 14836  systemd-udevd

(...)

6482   postmaster   15  
 30649    postmaster

6482   postmaster   15  
 30648    postmaster

6482   postmaster   15  
 30647    postmaster

6482   postmaster   15  
 30646    postmaster

6482   postmaster   15  
 30645    postmaster

6482   postmaster   15  
 30659    postmaster

6482   postmaster   15  
 30658    postmaster

6482   postmaster   15  
 30657    postmaster

6482   postmaster   15  
 30656    postmaster

6482   postmaster   15  
 30655    postmaster

6482   postmaster   15  
 2065     postmaster

6482   postmaster   15  
 2064     postmaster

6482   postmaster   15  
 2063     postmaster

Several of these TO-pid's match those
in the PG log.

$ ps aux | grep 6482

postgres  6482  0.0  0.5
51755776 1043872 ?    Ss   Sep28   3:25 /usr/pgsql-11/bin/postmaster
-D /data/11/data

Am I correct in concluding that postmaster
is killing its own processes? If so, what is going on here? And more importantly,
what do I do about it?

Regards,

Alban Hertroys.

P.S. I'm mailing from my work account,
as this is work related. You may remember me from my private account.

Alban  Hertroys     

D: +31 (0)53 4 888 888  | T:
+31 (0)53 4888 888 | E: alban.hertroys@xxxxxxxxxxxxxxxxxxxx

Apollo Vredestein
B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands

Chamber of Commerce
number: 34223268

The
information contained in this e-mail is intended solely for the use of
the individual or entity to whom it is addressed. If you are not the intended
recipient, you are hereby notified that any disclosure, copying, distribution
or action in relation to the contents of this information is strictly prohibited
and may be unlawful and request you to delete this message and any attachments
and advise the sender by return e-mail. The confidentiality of this message
is not warranted. Apollo Vredestein and its subsidiaries rule out any and
every liability resulting from this or any other electronic transmission

 Please consider the environment before printing this e-mail