We were hit by some interesting addition to systemd, it appears that logging in/out to the machine with the user account used to start the postgres service has some catastrophic effect. A systemd process deleted Postgresql.NNNN file in /dev/shm (tmpfs).
errors:
Jan 21 10:30:01 stg1 systemd: Started Session 3396 of user admin.Jan 21 10:30:01 stg1 systemd: Starting Session 3396 of user admin.Jan 21 10:30:01 stg1 postgres[31239]: [3-1] FATAL: semctl(13139971, 11, SETVAL, 0) failed: Invalid argumentJan 21 10:30:01 stg1 postgres[28042]: [3-1] LOG: server process (PID 31239) exited with exit code 1Jan 21 10:30:01 stg1 postgres[28042]: [4-1] LOG: terminating any other active server processesJan 21 10:30:01 stg1 postgres[28047]: [3-1] WARNING: terminating connection because of crash of another server processJan 21 10:30:01 stg1 postgres[28047]: [3-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.Jan 21 10:30:01 stg1 postgres[28047]: [3-3] HINT: In a moment you should be able to reconnect to the database and repeat your command.Jan 21 10:30:01 stg1 postgres[28042]: [5-1] LOG: all server processes terminated; reinitializingJan 21 10:30:01 stg1 postgres[28042]: [6-1] LOG: could not remove shared memory segment "/PostgreSQL.1804289383": No such file or directoryJan 21 10:30:01 stg1 postgres[28042]: [7-1] LOG: semctl(13041664, 0, IPC_RMID, ...) failed: Invalid argumentJan 21 10:30:01 stg1 postgres[28042]: [8-1] LOG: semctl(13074433, 0, IPC_RMID, ...) failed: Invalid argumentJan 21 10:30:01 stg1 postgres[28042]: [9-1] LOG: semctl(13107202, 0, IPC_RMID, ...) failed: Invalid argumentJan 21 10:30:01 stg1 postgres[28042]: [10-1] LOG: semctl(13139971, 0, IPC_RMID, ...) failed: Invalid argumentJan 21 10:30:01 stg1 postgres[28042]: [11-1] LOG: semctl(13172740, 0, IPC_RMID, ...) failed: Invalid argumentJan 21 10:30:01 stg1 postgres[31260]: [12-1] LOG: database system was interrupted; last known up at 2016-01-21 10:23:17 PSTJan 21 10:30:01 stg1 postgres[31260]: [13-1] LOG: database system was not properly shut down; automatic recovery in progressJan 21 10:30:01 stg1 postgres[31260]: [14-1] LOG: record with zero length at 130/66154E90Jan 21 10:30:01 stg1 postgres[31260]: [15-1] LOG: redo is not requiredJan 21 10:30:01 stg1 postgres[31260]: [16-1] LOG: MultiXact member wraparound protections are now enabledJan 21 10:30:01 stg1 postgres[28042]: [12-1] LOG: database system is ready to accept connectionsJan 21 10:30:01 stg1 postgres[31267]: [12-1] LOG: autovacuum launcher startedJan 21 10:30:26 stg1 systemd: Removed slice user-1001.slice.Jan 21 10:30:26 stg1 systemd: Stopping user-1001.slice.Jan 21 10:30:35 stg1 systemd: Created slice user-1001.slice.Jan 21 10:30:35 stg1 systemd: Starting user-1001.slice.Jan 21 10:30:35 stg1 systemd-logind: New session 3397 of user admin.
$ psql postgres
psql: FATAL: semctl(11337731, 11, SETVAL, 0) failed: Invalid argument
log shows pg crashes and restarts..
$ psql postgres
psql (9.4.5)Type "help" for help.postgres=#
Postgresql file in /dev/shm (tmpfs) appears to be removed by some systemd process:
$ ls -lt /dev/shm/
total 84-rw------- 1 admin admin 3916 Jan 21 09:05 PostgreSQL.1804289383 ==> deleted causing the errors above-r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-3708236591-r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-4055075926-r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-3910933030-r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-979612067
OS:
$ cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)
Postgres version:
postgres=# select version();
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
$ cat /etc/systemd/logind.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See logind.conf(5) for details.
[Login]
#NAutoVTs=6
#ReserveVT=6
#KillUserProcesses=no
#KillOnlyUsers=
#KillExcludeUsers=root
#InhibitDelayMaxSec=5
#HandlePowerKey=poweroff
#HandleSuspendKey=suspend
#HandleHibernateKey=hibernate
#HandleLidSwitch=suspend
#HandleLidSwitchDocked=ignore
#PowerKeyIgnoreInhibited=no
#SuspendKeyIgnoreInhibited=no
#HibernateKeyIgnoreInhibited=no
#LidSwitchIgnoreInhibited=yes
#IdleAction=ignore
#IdleActionSec=30min
#RuntimeDirectorySize=10% =>> new entry
#RemoveIPC=yes =>> new entry
Culprit could be a recent install which updated systemd to 219:
Jan 19 13:29:23 Updated: systemd-libs-219-19.el7.x86_64
Jan 19 13:29:28 Updated: systemd-219-19.el7.x86_64
Jan 19 13:29:39 Updated: systemd-sysv-219-19.el7.x86_64
Jan 19 13:29:40 Updated: systemd-python-219-19.el7.x86_64
Anybody on the list having the same issue? As a workaround, we have set the 2 new entries in logind.conf from:
#RuntimeDirectorySize=10%#RemoveIPC=yes
to
RuntimeDirectorySize=1%
RemoveIPC=no
RuntimeDirectorySize to 1% (optional), when a user ssh/logins to the server a new tmpfs mount is created using 10% of the RAM size (machine has 512GB) - looks like a new change that came with systemd updates too.
before mods:
$ mount | grep tmpfs
devtmpfs on /dev type devtmpfs (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/42 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=42,gid=42) ==> gdm
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700) ==> root
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=1001,gid=1001) ==> some user ~51G tmpfs (new feature?)
tmpfs on /run/user/6301 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=6301,gid=10000) ==> some user
before mods:
$ df -h | grep tmpfs
devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 84K 252G 1% /dev/shm
tmpfs 252G 492M 252G 1% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
tmpfs 51G 0 51G 0% /run/user/42
tmpfs 51G 0 51G 0% /run/user/0
after mods:
$ mount | grep tmpfs
devtmpfs on /dev type devtmpfs (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/42 type tmpfs (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=42,gid=42)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=5280284k,mode=700)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=1001,gid=1001)
after mods:
$ df -h | grep tmpfs
devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 88K 252G 1% /dev/shm
tmpfs 252G 19M 252G 1% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
tmpfs 5.1G 12K 5.1G 1% /run/user/42
tmpfs 5.1G 0 5.1G 0% /run/user/0
RemoveIPC to no - disabling works - /dev/shm/Postgres.NNNN file seemed to be intact.
This is the forum post I found that can be linked to this:
--
regards
marie gezeala bacuño II
regards
marie gezeala bacuño II