systemd deletes shared memory segment in /dev/shm/Postgresql.NNNNNN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We were hit by some interesting addition to systemd, it appears that logging in/out to the machine with the user account used to start the postgres service has some catastrophic effect. A systemd process deleted Postgresql.NNNN file in /dev/shm (tmpfs).

errors:

Jan 21 10:30:01 stg1 systemd: Started Session 3396 of user admin.
Jan 21 10:30:01 stg1 systemd: Starting Session 3396 of user admin.
Jan 21 10:30:01 stg1 postgres[31239]: [3-1] FATAL:  semctl(13139971, 11, SETVAL, 0) failed: Invalid argument
Jan 21 10:30:01 stg1 postgres[28042]: [3-1] LOG:  server process (PID 31239) exited with exit code 1
Jan 21 10:30:01 stg1 postgres[28042]: [4-1] LOG:  terminating any other active server processes
Jan 21 10:30:01 stg1 postgres[28047]: [3-1] WARNING:  terminating connection because of crash of another server process
Jan 21 10:30:01 stg1 postgres[28047]: [3-2] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Jan 21 10:30:01 stg1 postgres[28047]: [3-3] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
Jan 21 10:30:01 stg1 postgres[28042]: [5-1] LOG:  all server processes terminated; reinitializing
Jan 21 10:30:01 stg1 postgres[28042]: [6-1] LOG:  could not remove shared memory segment "/PostgreSQL.1804289383": No such file or directory
Jan 21 10:30:01 stg1 postgres[28042]: [7-1] LOG:  semctl(13041664, 0, IPC_RMID, ...) failed: Invalid argument
Jan 21 10:30:01 stg1 postgres[28042]: [8-1] LOG:  semctl(13074433, 0, IPC_RMID, ...) failed: Invalid argument
Jan 21 10:30:01 stg1 postgres[28042]: [9-1] LOG:  semctl(13107202, 0, IPC_RMID, ...) failed: Invalid argument
Jan 21 10:30:01 stg1 postgres[28042]: [10-1] LOG:  semctl(13139971, 0, IPC_RMID, ...) failed: Invalid argument
Jan 21 10:30:01 stg1 postgres[28042]: [11-1] LOG:  semctl(13172740, 0, IPC_RMID, ...) failed: Invalid argument
Jan 21 10:30:01 stg1 postgres[31260]: [12-1] LOG:  database system was interrupted; last known up at 2016-01-21 10:23:17 PST
Jan 21 10:30:01 stg1 postgres[31260]: [13-1] LOG:  database system was not properly shut down; automatic recovery in progress
Jan 21 10:30:01 stg1 postgres[31260]: [14-1] LOG:  record with zero length at 130/66154E90
Jan 21 10:30:01 stg1 postgres[31260]: [15-1] LOG:  redo is not required
Jan 21 10:30:01 stg1 postgres[31260]: [16-1] LOG:  MultiXact member wraparound protections are now enabled
Jan 21 10:30:01 stg1 postgres[28042]: [12-1] LOG:  database system is ready to accept connections
Jan 21 10:30:01 stg1 postgres[31267]: [12-1] LOG:  autovacuum launcher started
Jan 21 10:30:26 stg1 systemd: Removed slice user-1001.slice.
Jan 21 10:30:26 stg1 systemd: Stopping user-1001.slice.
Jan 21 10:30:35 stg1 systemd: Created slice user-1001.slice.
Jan 21 10:30:35 stg1 systemd: Starting user-1001.slice.
Jan 21 10:30:35 stg1 systemd-logind: New session 3397 of user admin.
 
$ psql postgres
psql: FATAL:  semctl(11337731, 11, SETVAL, 0) failed: Invalid argument
 
log shows pg crashes and restarts..

$ psql postgres
psql (9.4.5)
Type "help" for help.

postgres=#

Postgresql file in /dev/shm (tmpfs) appears to be removed by some systemd process:

$ ls -lt /dev/shm/
total 84
-rw------- 1 admin admin     3916 Jan 21 09:05 PostgreSQL.1804289383  ==> deleted causing the errors above
-r-------- 1 gdm   gdm   67108904 Jan 20 18:38 pulse-shm-3708236591
-r-------- 1 gdm   gdm   67108904 Jan 20 18:38 pulse-shm-4055075926
-r-------- 1 gdm   gdm   67108904 Jan 20 18:38 pulse-shm-3910933030
-r-------- 1 gdm   gdm   67108904 Jan 20 18:38 pulse-shm-979612067

  
OS:
$ cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)

Postgres version:
postgres=# select version();
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
 

$ cat /etc/systemd/logind.conf

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See logind.conf(5) for details.

[Login]
#NAutoVTs=6
#ReserveVT=6
#KillUserProcesses=no
#KillOnlyUsers=
#KillExcludeUsers=root
#InhibitDelayMaxSec=5
#HandlePowerKey=poweroff
#HandleSuspendKey=suspend
#HandleHibernateKey=hibernate
#HandleLidSwitch=suspend
#HandleLidSwitchDocked=ignore
#PowerKeyIgnoreInhibited=no
#SuspendKeyIgnoreInhibited=no
#HibernateKeyIgnoreInhibited=no
#LidSwitchIgnoreInhibited=yes
#IdleAction=ignore
#IdleActionSec=30min
#RuntimeDirectorySize=10%     =>>  new entry
#RemoveIPC=yes                    =>>  new entry
 
 
Culprit could be a recent install which updated systemd to 219:
Jan 19 13:29:23 Updated: systemd-libs-219-19.el7.x86_64
Jan 19 13:29:28 Updated: systemd-219-19.el7.x86_64
Jan 19 13:29:39 Updated: systemd-sysv-219-19.el7.x86_64
Jan 19 13:29:40 Updated: systemd-python-219-19.el7.x86_64

Anybody on the list having the same issue? As a workaround, we have set the 2 new entries in logind.conf from:
#RuntimeDirectorySize=10%
#RemoveIPC=yes

to
RuntimeDirectorySize=1%
RemoveIPC=no

RuntimeDirectorySize to 1% (optional), when a user ssh/logins to the server a new tmpfs mount is created using 10% of the RAM size (machine has 512GB) - looks like a new change that came with systemd updates too.

before mods:
$ mount | grep tmpfs
devtmpfs on /dev type devtmpfs (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/42 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=42,gid=42)  ==> gdm
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700)  ==> root
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=1001,gid=1001) ==> some user ~51G tmpfs (new feature?)
tmpfs on /run/user/6301 type tmpfs (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=6301,gid=10000)  ==> some user
 
before mods:
$ df -h | grep tmpfs
devtmpfs                                252G     0  252G   0% /dev
tmpfs                                   252G   84K  252G   1% /dev/shm 
tmpfs                                   252G  492M  252G   1% /run 
tmpfs                                   252G     0  252G   0% /sys/fs/cgroup 
tmpfs                                    51G     0   51G   0% /run/user/42 
tmpfs                                    51G     0   51G   0% /run/user/0         
 
after mods:
$ mount | grep tmpfs
devtmpfs on /dev type devtmpfs (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/42 type tmpfs (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=42,gid=42)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=5280284k,mode=700)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=1001,gid=1001)

after mods:
$ df -h | grep tmpfs
devtmpfs                                252G     0  252G   0% /dev
tmpfs                                   252G   88K  252G   1% /dev/shm
tmpfs                                   252G   19M  252G   1% /run
tmpfs                                   252G     0  252G   0% /sys/fs/cgroup
tmpfs                                   5.1G   12K  5.1G   1% /run/user/42
tmpfs                                   5.1G     0  5.1G   0% /run/user/0

RemoveIPC to no - disabling works - /dev/shm/Postgres.NNNN file seemed to be intact.

This is the forum post I found that can be linked to this:
http://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html


--

regards

marie gezeala bacuño II

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux