On 5/8/19 4:35 PM, Julie Nishimura wrote:
Thanks again for your help.
1) max_connections is set to 500 in our config file
2) our average conn load is 300-400, however, I am not sure if it was
close to max_conn at the moment of restart
3) we do not have replication on this staging server
4) These are the snippets from auth logs with the corresponding time:
I do not see anything that matches.
Do you know what time zone the log entries are in?
cat /etc/timezone might help.
/var/message/syslog:
May 7 00:02:01 hitw-esg-portal-stage-db-1 CRON[10290]: (postgres) CMD
(/usr/bin/find /data/postgresql/9.4/log -type f ! -name '*.bz2' -mmin
+480 -exec bzip2 {} \;)
May 7 00:03:01 hitw-esg-portal-stage-db-1 CRON[10323]: (postgres) CMD
(/usr/bin/find /data/postgresql/9.4/log -type f -empty -mmin +60 -delete)
May 7 00:05:01 hitw-esg-portal-stage-db-1 CRON[10379]: (root) CMD
(command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 7 00:15:01 hitw-esg-portal-stage-db-1 CRON[10655]: (root) CMD
(command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 7 00:17:01 hitw-esg-portal-stage-db-1 CRON[10705]: (root) CMD (
cd / && run-parts --report /etc/cron.hourly)
May 7 00:25:01 hitw-esg-portal-stage-db-1 CRON[10902]: (root) CMD
(command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 7 00:27:32 hitw-esg-portal-stage-db-1 crontab[11401]: (root) LIST
(root)
May 7 00:27:32 hitw-esg-portal-stage-db-1 crontab[11403]: (root) LIST
(postgres)
May 7 00:28:24 hitw-esg-portal-stage-db-1 puppet-agent[10914]:
(/Stage[main]/Ci::Db/Package[postgresql-plpython-]/ensure) ensure
changed 'purged' to 'present'
May 7 00:28:40 hitw-esg-portal-stage-db-1 puppet-agent[10914]: Finished
catalog run in 106.68 seconds
May 7 00:35:01 hitw-esg-portal-stage-db-1 CRON[14398]: (root) CMD
(command -v debian-sa1 > /dev/null && debian-sa1 1 1)
this is from auth.log:
May 7 00:01:01 hitw-esg-portal-stage-db-1 CRON[10265]:
pam_unix(cron:session): session opened for user postgres by (uid=0)
May 7 00:01:01 hitw-esg-portal-stage-db-1 CRON[10265]:
pam_unix(cron:session): session closed for user postgres
May 7 00:02:01 hitw-esg-portal-stage-db-1 CRON[10289]:
pam_unix(cron:session): session opened for user postgres by (uid=0)
May 7 00:02:01 hitw-esg-portal-stage-db-1 CRON[10289]:
pam_unix(cron:session): session closed for user postgres
May 7 00:02:29 hitw-esg-portal-stage-db-1 sshd[10306]: Connection
closed by ::1 port 49936 [preauth]
May 7 00:03:01 hitw-esg-portal-stage-db-1 CRON[10322]:
pam_unix(cron:session): session opened for user postgres by (uid=0)
May 7 00:03:01 hitw-esg-portal-stage-db-1 CRON[10322]:
pam_unix(cron:session): session closed for user postgres
May 7 00:04:30 hitw-esg-portal-stage-db-1 sshd[10365]: Connection
closed by ::1 port 49942 [preauth]
May 7 00:05:01 hitw-esg-portal-stage-db-1 CRON[10378]:
pam_unix(cron:session): session opened for user root by (uid=0)
May 7 00:05:01 hitw-esg-portal-stage-db-1 CRON[10378]:
pam_unix(cron:session): session closed for user root
May 7 00:06:31 hitw-esg-portal-stage-db-1 sshd[10422]: Connection
closed by ::1 port 49948 [preauth]
May 7 00:08:33 hitw-esg-portal-stage-db-1 sshd[10483]: Connection
closed by ::1 port 49954 [preauth]
May 7 00:10:34 hitw-esg-portal-stage-db-1 sshd[10540]: Connection
closed by ::1 port 49960 [preauth]
May 7 00:12:35 hitw-esg-portal-stage-db-1 sshd[10594]: Connection
closed by ::1 port 49966 [preauth]
May 7 00:14:36 hitw-esg-portal-stage-db-1 sshd[10642]: Connection
closed by ::1 port 49972 [preauth]
May 7 00:15:01 hitw-esg-portal-stage-db-1 CRON[10654]:
pam_unix(cron:session): session opened for user root by (uid=0)
May 7 00:15:01 hitw-esg-portal-stage-db-1 CRON[10654]:
pam_unix(cron:session): session closed for user root
May 7 00:16:37 hitw-esg-portal-stage-db-1 sshd[10691]: Connection
closed by ::1 port 49978 [preauth]
May 7 00:17:01 hitw-esg-portal-stage-db-1 CRON[10704]:
pam_unix(cron:session): session opened for user root by (uid=0)
May 7 00:17:01 hitw-esg-portal-stage-db-1 CRON[10704]:
pam_unix(cron:session): session closed for user root
May 7 00:18:39 hitw-esg-portal-stage-db-1 sshd[10741]: Connection
closed by ::1 port 49984 [preauth]
May 7 00:20:40 hitw-esg-portal-stage-db-1 sshd[10787]: Connection
closed by ::1 port 49990 [preauth]
May 7 00:22:41 hitw-esg-portal-stage-db-1 sshd[10833]: Connection
closed by ::1 port 49996 [preauth]
May 7 00:24:42 hitw-esg-portal-stage-db-1 sshd[10890]: Connection
closed by ::1 port 50002 [preauth]
May 7 00:25:01 hitw-esg-portal-stage-db-1 CRON[10901]:
pam_unix(cron:session): session opened for user root by (uid=0)
May 7 00:25:01 hitw-esg-portal-stage-db-1 CRON[10901]:
pam_unix(cron:session): session closed for user root
May 7 00:25:44 hitw-esg-portal-stage-db-1 sudo: root : TTY=unknown
; PWD=/ ; USER=postgres ; COMMAND=/usr/bin/psql -q -t -P
format=unaligned -c SELECT datname, pg_encoding_to_char(encoding) FROM
pg_database WHERE datistemplate = false AND datname != 'postgres'
May 7 00:25:44 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session):
session opened for user postgres by (uid=0)
May 7 00:25:44 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session):
session closed for user postgres
May 7 00:25:51 hitw-esg-portal-stage-db-1 sudo: root : TTY=unknown
; PWD=/ ; USER=postgres ; COMMAND=/usr/bin/psql -q -t -P
format=unaligned -c SHOW server_version
May 7 00:25:51 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session):
session opened for user postgres by (uid=0)
May 7 00:25:51 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session):
session closed for user postgres
May 7 00:26:44 hitw-esg-portal-stage-db-1 sshd[11327]: Connection
closed by ::1 port 50018 [preauth]
For me, it seems like there is some cron job ran shortly after midnight
(either from root or through puppet and restarted the server, because
maybe it was close to max connections - just a guess). What do you think?
-Julie N
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@xxxxxxxxxxx>
*Sent:* Wednesday, May 8, 2019 2:17 PM
*To:* Julie Nishimura
*Cc:* pgsql-general@xxxxxxxxxxxxxxxxxxxx; pgsql-general
*Subject:* Re: postgresql 9.4 restart
On 5/8/19 2:01 PM, Julie Nishimura wrote:
Adrian, thanks for your reply!
PostgreSQL 9.4.21 on x86_64-pc-linux-gnu (Ubuntu 9.4.21-1.pgdg16.04+1),
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
This is what Ive found in logs prior to the restart:
UTC,"hitwise","hitwise_us_stg",8677,"10.200.193.58:49428",5cd0cc7a.21e5,3,"startup",2019-05-07
00:08:26 UTC,448/970,0,FATAL,53300,"remaining connection slots are
reserved for non-replication superuser connections",,,,,,,,,""
2019-05-07 00:08:26.778
The above is not good.
What do you have set for max_connections?
How many connections are you trying against the database?
Do you have replication set up to other Postgres instances(uk, au, us)?
What does the system log show from the same time period?
UTC,"hitwise","hitwise_us_stg",8675,"10.200.193.58:49424",5cd0cc7a.21e3,6,"idle",2019-05-07
00:08:26 UTC,,0,LOG,00000,"disconnection: session time: 0:00:00.021
user=hitwise database=hitwise_us_stg host=10.200.193.58
port=49424",,,,,,,,,""
2019-05-07 00:08:26.779
UTC,"hitwise","hitwise_us_stg",8676,"10.200.193.58:49426",5cd0cc7a.21e4,6,"idle",2019-05-07
00:08:26 UTC,,0,LOG,00000,"disconnection: session time: 0:00:00.018
user=hitwise database=hitwise_us_stg host=10.200.193.58
port=49426",,,,,,,,,""
2019-05-07 00:08:28.060 UTC,,,4117,,5cbfb2c9.1015,5,,2019-04-24 00:50:17
UTC,,0,LOG,00000,"received fast shutdown request",,,,,,,,,""
2019-05-07 00:08:28.060 UTC,,,4117,,5cbfb2c9.1015,6,,2019-04-24 00:50:17
UTC,,0,LOG,00000,"aborting any active transactions",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_uk_stg",8558,"10.200.193.58:49246",5cd0cc20.216e,6,"idle",2019-05-07
00:06:56 UTC,54/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_uk_stg",8559,"10.200.193.58:49248",5cd0cc20.216f,6,"idle",2019-05-07
00:06:56 UTC,56/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_uk_stg",8561,"10.200.193.58:49252",5cd0cc20.2171,6,"idle",2019-05-07
00:06:56 UTC,429/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_au_stg",8578,"10.200.250.49:31100",5cd0cc33.2182,6,"idle",2019-05-07
00:07:15 UTC,19/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_us_stg",8626,"10.200.250.49:31332",5cd0cc61.21b2,6,"idle",2019-05-07
00:08:01 UTC,3/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_uk_stg",8560,"10.200.193.58:49250",5cd0cc20.2170,6,"idle",2019-05-07
00:06:56 UTC,227/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_uk_stg",8558,"10.200.193.58:49246",5cd0cc20.216e,7,"idle",2019-05-07
00:06:56 UTC,,0,LOG,00000,"disconnection: session time: 0:01:31.339
user=hitwise database=hitwise_uk_stg host=10.200.193.58
port=49246",,,,,,,,,""
2019-05-07 00:08:28.060
UTC,"hitwise","hitwise_au_stg",8578,"10.200.250.49:31100",5cd0cc33.2182,7,"idle",2019-05-07
00:07:15 UTC,,0,LOG,00000,"disconnection: session time: 0:01:12.215
user=hitwise database=hitwise_au_stg host=10.200.250.49
port=31100",,,,,,,,,""
Does it help?
------------------------------------------------------------------------
*From:* Adrian Klaver <adrian.klaver@xxxxxxxxxxx>
*Sent:* Wednesday, May 8, 2019 12:07 PM
*To:* Julie Nishimura
*Cc:* pgsql-general@xxxxxxxxxxxxxxxxxxxx; pgsql-general
*Subject:* Re: postgresql 9.4 restart
On 5/8/19 11:00 AM, Julie Nishimura wrote:
Hello,
Our staging 9.4 postgresql has been running couple of weeks with no
problem, but yesterday we saw this error from one of our services that
connects to the staging DBs:
FATAL: terminating connection due to
administrator command; nested exception is
org.postgresql.util.PSQLException: FATAL: terminating connection due to
administrator command”,
Distro and version?
Well it originated in Java code if that helps?
Anything in the Postgres logs just prior to 'FATAL: terminating
connection ...' that might be relevant?
Do the system logs showing anything relevant e.g. the OOM killer
stopping Postgres?
Then I could see that postgresql got restarted, based on the output of
"select pg_postmaster_start_time()"
We are running postgresql on
Linux xxx-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64
x86_64 x86_64 GNU/Linux
How I can find out what cause the restart? Any ideas are welcome. Thanks!
-Julie N
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx