Etc/UTC
From: Adrian Klaver <adrian.klaver@xxxxxxxxxxx>
Sent: Thursday, May 9, 2019 7:51 AM To: Julie Nishimura Cc: pgsql-general@xxxxxxxxxxxxxxxxxxxx; pgsql-general Subject: Re: postgresql 9.4 restart On 5/8/19 4:35 PM, Julie Nishimura wrote:
> Thanks again for your help. > > 1) max_connections is set to 500 in our config file > 2) our average conn load is 300-400, however, I am not sure if it was > close to max_conn at the moment of restart > 3) we do not have replication on this staging server > 4) These are the snippets from auth logs with the corresponding time: I do not see anything that matches. Do you know what time zone the log entries are in? cat /etc/timezone might help. > > /var/message/syslog: > > May 7 00:02:01 hitw-esg-portal-stage-db-1 CRON[10290]: (postgres) CMD > (/usr/bin/find /data/postgresql/9.4/log -type f ! -name '*.bz2' -mmin > +480 -exec bzip2 {} \;) > May 7 00:03:01 hitw-esg-portal-stage-db-1 CRON[10323]: (postgres) CMD > (/usr/bin/find /data/postgresql/9.4/log -type f -empty -mmin +60 -delete) > May 7 00:05:01 hitw-esg-portal-stage-db-1 CRON[10379]: (root) CMD > (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > May 7 00:15:01 hitw-esg-portal-stage-db-1 CRON[10655]: (root) CMD > (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > May 7 00:17:01 hitw-esg-portal-stage-db-1 CRON[10705]: (root) CMD ( > cd / && run-parts --report /etc/cron.hourly) > May 7 00:25:01 hitw-esg-portal-stage-db-1 CRON[10902]: (root) CMD > (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > May 7 00:27:32 hitw-esg-portal-stage-db-1 crontab[11401]: (root) LIST > (root) > May 7 00:27:32 hitw-esg-portal-stage-db-1 crontab[11403]: (root) LIST > (postgres) > May 7 00:28:24 hitw-esg-portal-stage-db-1 puppet-agent[10914]: > (/Stage[main]/Ci::Db/Package[postgresql-plpython-]/ensure) ensure > changed 'purged' to 'present' > May 7 00:28:40 hitw-esg-portal-stage-db-1 puppet-agent[10914]: Finished > catalog run in 106.68 seconds > May 7 00:35:01 hitw-esg-portal-stage-db-1 CRON[14398]: (root) CMD > (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > > this is from auth.log: > > May 7 00:01:01 hitw-esg-portal-stage-db-1 CRON[10265]: > pam_unix(cron:session): session opened for user postgres by (uid=0) > May 7 00:01:01 hitw-esg-portal-stage-db-1 CRON[10265]: > pam_unix(cron:session): session closed for user postgres > May 7 00:02:01 hitw-esg-portal-stage-db-1 CRON[10289]: > pam_unix(cron:session): session opened for user postgres by (uid=0) > May 7 00:02:01 hitw-esg-portal-stage-db-1 CRON[10289]: > pam_unix(cron:session): session closed for user postgres > May 7 00:02:29 hitw-esg-portal-stage-db-1 sshd[10306]: Connection > closed by ::1 port 49936 [preauth] > May 7 00:03:01 hitw-esg-portal-stage-db-1 CRON[10322]: > pam_unix(cron:session): session opened for user postgres by (uid=0) > May 7 00:03:01 hitw-esg-portal-stage-db-1 CRON[10322]: > pam_unix(cron:session): session closed for user postgres > May 7 00:04:30 hitw-esg-portal-stage-db-1 sshd[10365]: Connection > closed by ::1 port 49942 [preauth] > May 7 00:05:01 hitw-esg-portal-stage-db-1 CRON[10378]: > pam_unix(cron:session): session opened for user root by (uid=0) > May 7 00:05:01 hitw-esg-portal-stage-db-1 CRON[10378]: > pam_unix(cron:session): session closed for user root > May 7 00:06:31 hitw-esg-portal-stage-db-1 sshd[10422]: Connection > closed by ::1 port 49948 [preauth] > May 7 00:08:33 hitw-esg-portal-stage-db-1 sshd[10483]: Connection > closed by ::1 port 49954 [preauth] > May 7 00:10:34 hitw-esg-portal-stage-db-1 sshd[10540]: Connection > closed by ::1 port 49960 [preauth] > May 7 00:12:35 hitw-esg-portal-stage-db-1 sshd[10594]: Connection > closed by ::1 port 49966 [preauth] > May 7 00:14:36 hitw-esg-portal-stage-db-1 sshd[10642]: Connection > closed by ::1 port 49972 [preauth] > May 7 00:15:01 hitw-esg-portal-stage-db-1 CRON[10654]: > pam_unix(cron:session): session opened for user root by (uid=0) > May 7 00:15:01 hitw-esg-portal-stage-db-1 CRON[10654]: > pam_unix(cron:session): session closed for user root > May 7 00:16:37 hitw-esg-portal-stage-db-1 sshd[10691]: Connection > closed by ::1 port 49978 [preauth] > May 7 00:17:01 hitw-esg-portal-stage-db-1 CRON[10704]: > pam_unix(cron:session): session opened for user root by (uid=0) > May 7 00:17:01 hitw-esg-portal-stage-db-1 CRON[10704]: > pam_unix(cron:session): session closed for user root > May 7 00:18:39 hitw-esg-portal-stage-db-1 sshd[10741]: Connection > closed by ::1 port 49984 [preauth] > May 7 00:20:40 hitw-esg-portal-stage-db-1 sshd[10787]: Connection > closed by ::1 port 49990 [preauth] > May 7 00:22:41 hitw-esg-portal-stage-db-1 sshd[10833]: Connection > closed by ::1 port 49996 [preauth] > May 7 00:24:42 hitw-esg-portal-stage-db-1 sshd[10890]: Connection > closed by ::1 port 50002 [preauth] > May 7 00:25:01 hitw-esg-portal-stage-db-1 CRON[10901]: > pam_unix(cron:session): session opened for user root by (uid=0) > May 7 00:25:01 hitw-esg-portal-stage-db-1 CRON[10901]: > pam_unix(cron:session): session closed for user root > May 7 00:25:44 hitw-esg-portal-stage-db-1 sudo: root : TTY=unknown > ; PWD=/ ; USER=postgres ; COMMAND=/usr/bin/psql -q -t -P > format=unaligned -c SELECT datname, pg_encoding_to_char(encoding) FROM > pg_database WHERE datistemplate = false AND datname != 'postgres' > May 7 00:25:44 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session): > session opened for user postgres by (uid=0) > May 7 00:25:44 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session): > session closed for user postgres > May 7 00:25:51 hitw-esg-portal-stage-db-1 sudo: root : TTY=unknown > ; PWD=/ ; USER=postgres ; COMMAND=/usr/bin/psql -q -t -P > format=unaligned -c SHOW server_version > May 7 00:25:51 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session): > session opened for user postgres by (uid=0) > May 7 00:25:51 hitw-esg-portal-stage-db-1 sudo: pam_unix(sudo:session): > session closed for user postgres > May 7 00:26:44 hitw-esg-portal-stage-db-1 sshd[11327]: Connection > closed by ::1 port 50018 [preauth] > > > For me, it seems like there is some cron job ran shortly after midnight > (either from root or through puppet and restarted the server, because > maybe it was close to max connections - just a guess). What do you think? > > -Julie N > > > ------------------------------------------------------------------------ > *From:* Adrian Klaver <adrian.klaver@xxxxxxxxxxx> > *Sent:* Wednesday, May 8, 2019 2:17 PM > *To:* Julie Nishimura > *Cc:* pgsql-general@xxxxxxxxxxxxxxxxxxxx; pgsql-general > *Subject:* Re: postgresql 9.4 restart > On 5/8/19 2:01 PM, Julie Nishimura wrote: >> Adrian, thanks for your reply! >> >> >> >> PostgreSQL 9.4.21 on x86_64-pc-linux-gnu (Ubuntu 9.4.21-1.pgdg16.04+1), >> compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit >> >> This is what Ive found in logs prior to the restart: > >> UTC,"hitwise","hitwise_us_stg",8677,"10.200.193.58:49428",5cd0cc7a.21e5,3,"startup",2019-05-07 >> 00:08:26 UTC,448/970,0,FATAL,53300,"remaining connection slots are >> reserved for non-replication superuser connections",,,,,,,,,"" >> 2019-05-07 00:08:26.778 > > The above is not good. > > What do you have set for max_connections? > > How many connections are you trying against the database? > > Do you have replication set up to other Postgres instances(uk, au, us)? > > What does the system log show from the same time period? > > > >> UTC,"hitwise","hitwise_us_stg",8675,"10.200.193.58:49424",5cd0cc7a.21e3,6,"idle",2019-05-07 >> 00:08:26 UTC,,0,LOG,00000,"disconnection: session time: 0:00:00.021 >> user=hitwise database=hitwise_us_stg host=10.200.193.58 >> port=49424",,,,,,,,,"" >> 2019-05-07 00:08:26.779 >> UTC,"hitwise","hitwise_us_stg",8676,"10.200.193.58:49426",5cd0cc7a.21e4,6,"idle",2019-05-07 >> 00:08:26 UTC,,0,LOG,00000,"disconnection: session time: 0:00:00.018 >> user=hitwise database=hitwise_us_stg host=10.200.193.58 >> port=49426",,,,,,,,,"" >> 2019-05-07 00:08:28.060 UTC,,,4117,,5cbfb2c9.1015,5,,2019-04-24 00:50:17 >> UTC,,0,LOG,00000,"received fast shutdown request",,,,,,,,,"" >> 2019-05-07 00:08:28.060 UTC,,,4117,,5cbfb2c9.1015,6,,2019-04-24 00:50:17 >> UTC,,0,LOG,00000,"aborting any active transactions",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_uk_stg",8558,"10.200.193.58:49246",5cd0cc20.216e,6,"idle",2019-05-07 >> 00:06:56 UTC,54/0,0,FATAL,57P01,"terminating connection due to >> administrator command",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_uk_stg",8559,"10.200.193.58:49248",5cd0cc20.216f,6,"idle",2019-05-07 >> 00:06:56 UTC,56/0,0,FATAL,57P01,"terminating connection due to >> administrator command",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_uk_stg",8561,"10.200.193.58:49252",5cd0cc20.2171,6,"idle",2019-05-07 >> 00:06:56 UTC,429/0,0,FATAL,57P01,"terminating connection due to >> administrator command",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_au_stg",8578,"10.200.250.49:31100",5cd0cc33.2182,6,"idle",2019-05-07 >> 00:07:15 UTC,19/0,0,FATAL,57P01,"terminating connection due to >> administrator command",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_us_stg",8626,"10.200.250.49:31332",5cd0cc61.21b2,6,"idle",2019-05-07 >> 00:08:01 UTC,3/0,0,FATAL,57P01,"terminating connection due to >> administrator command",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_uk_stg",8560,"10.200.193.58:49250",5cd0cc20.2170,6,"idle",2019-05-07 >> 00:06:56 UTC,227/0,0,FATAL,57P01,"terminating connection due to >> administrator command",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_uk_stg",8558,"10.200.193.58:49246",5cd0cc20.216e,7,"idle",2019-05-07 >> 00:06:56 UTC,,0,LOG,00000,"disconnection: session time: 0:01:31.339 >> user=hitwise database=hitwise_uk_stg host=10.200.193.58 >> port=49246",,,,,,,,,"" >> 2019-05-07 00:08:28.060 >> UTC,"hitwise","hitwise_au_stg",8578,"10.200.250.49:31100",5cd0cc33.2182,7,"idle",2019-05-07 >> 00:07:15 UTC,,0,LOG,00000,"disconnection: session time: 0:01:12.215 >> user=hitwise database=hitwise_au_stg host=10.200.250.49 >> port=31100",,,,,,,,,"" >> >> Does it help? >> >> ------------------------------------------------------------------------ >> *From:* Adrian Klaver <adrian.klaver@xxxxxxxxxxx> >> *Sent:* Wednesday, May 8, 2019 12:07 PM >> *To:* Julie Nishimura >> *Cc:* pgsql-general@xxxxxxxxxxxxxxxxxxxx; pgsql-general >> *Subject:* Re: postgresql 9.4 restart >> On 5/8/19 11:00 AM, Julie Nishimura wrote: >>> Hello, >>> Our staging 9.4 postgresql has been running couple of weeks with no >>> problem, but yesterday we saw this error from one of our services that >>> connects to the staging DBs: >>> >>> FATAL: terminating connection due to >>> administrator command; nested exception is >>> org.postgresql.util.PSQLException: FATAL: terminating connection due to >>> administrator command”, >> >> Distro and version? >> >> Well it originated in Java code if that helps? >> >> Anything in the Postgres logs just prior to 'FATAL: terminating >> connection ...' that might be relevant? >> >> Do the system logs showing anything relevant e.g. the OOM killer >> stopping Postgres? >> >> >> >>> >>> Then I could see that postgresql got restarted, based on the output of >>> "select pg_postmaster_start_time()" >>> >>> We are running postgresql on >>> Linux xxx-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 >>> x86_64 x86_64 GNU/Linux >>> >>> How I can find out what cause the restart? Any ideas are welcome. Thanks! >>> >>> -Julie N >>> >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@xxxxxxxxxxx > > > -- > Adrian Klaver > adrian.klaver@xxxxxxxxxxx -- Adrian Klaver adrian.klaver@xxxxxxxxxxx |