Hi. My company have servers that run postgres for storing some logs and serving django web interfaces for management of the server itself. In the last days some servers stopped serving the web interface, and syslog show this error:
Jun 23 04:40:19 django-1 postgres[8790]: [3-1] FATAL: remaining connection slots are reserved for non-replication superuser connections
So I started looking for the cause:
$ ps auxf...postgres 4580 0.7 0.1 305156 19184 ? S Jun01 251:46 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/datapostgres 4746 0.0 1.6 305608 274656 ? Ss Jun01 4:09 \_ postgres: checkpointer processpostgres 4747 0.0 1.2 305288 208708 ? Ss Jun01 0:33 \_ postgres: writer processpostgres 4748 0.0 0.0 305288 9300 ? Ss Jun01 0:04 \_ postgres: wal writer processpostgres 4750 0.0 0.0 305972 2376 ? Ss Jun01 0:00 \_ postgres: autovacuum launcher processpostgres 4752 0.1 0.0 23852 1396 ? Ss Jun01 50:54 \_ postgres: stats collector processpostgres 63615 0.0 0.1 307036 22208 ? Ss Jun02 0:00 \_ postgres: db db [local] idlepostgres 22476 0.0 0.0 306368 2576 ? Ss Jun22 0:00 \_ postgres: db db [local] idlepostgres 4521 0.0 0.0 306512 6408 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4522 0.0 0.0 306512 6412 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4523 0.0 0.0 306512 6416 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4524 0.0 0.0 306512 6412 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4534 0.0 0.0 306512 6152 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4544 0.0 0.0 306512 6420 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4552 0.0 0.0 306512 6408 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4742 0.0 0.0 306512 6400 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4743 0.0 0.0 306512 6400 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4766 0.0 0.0 306512 6408 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4770 0.0 0.0 306512 6144 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4774 0.0 0.0 306512 6396 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4783 0.0 0.0 306512 6400 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4786 0.0 0.0 306512 6376 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4804 0.0 0.0 306512 6376 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4812 0.0 0.0 306512 6376 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4860 0.0 0.0 306512 6356 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4862 0.0 0.0 306516 6672 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4868 0.0 0.0 306516 6408 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4878 0.0 0.0 306516 6684 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4881 0.0 0.0 306516 6164 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4882 0.0 0.0 306516 6168 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4886 0.0 0.0 306500 6524 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4887 0.0 0.0 306500 6272 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4889 0.0 0.0 306500 6272 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4890 0.0 0.0 306500 6276 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 4907 0.0 0.0 306500 6796 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5131 0.0 0.0 306500 6268 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5138 0.0 0.0 306512 6116 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5142 0.0 0.0 306512 6116 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5143 0.0 0.0 306512 6644 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5151 0.0 0.0 306512 6120 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5154 0.0 0.0 306512 6904 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5155 0.0 0.0 306512 6128 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5156 0.0 0.0 306512 6120 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5157 0.0 0.0 306512 6380 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5162 0.0 0.0 306512 6120 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5165 0.0 0.0 306512 6384 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5172 0.0 0.0 306512 6128 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5174 0.0 0.0 306512 6120 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5188 0.0 0.0 306512 6124 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5189 0.0 0.0 306512 6636 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 5190 0.0 0.0 306512 6104 ? Ss 00:13 0:00 \_ postgres: db db [local] idlepostgres 26621 0.0 0.0 306512 6116 ? Ss 12:54 0:00 \_ postgres: db db [local] idlepostgres 9600 0.0 0.0 306512 6120 ? Ss 13:30 0:00 \_ postgres: db db [local] idle
50 connections in idle status, but looking further:
$ cat /proc/4521/stack[<ffffffff81b59a8a>] unix_stream_recvmsg+0x2b9/0x633[<ffffffff81acbf27>] __sock_recvmsg_nosec+0x29/0x2b[<ffffffff81ace30c>] sock_recvmsg+0x65/0x88[<ffffffff81ace409>] SYSC_recvfrom+0xda/0x134[<ffffffff81acf4ef>] SyS_recvfrom+0x9/0xb[<ffffffff81bd4062>] system_call_fastpath+0x16/0x1b[<ffffffffffffffff>] 0xffffffffffffffff
All connections have stucked in this stack.
The postmaster process:
$ cat /proc/4580/stack[<ffffffff810ec9fc>] poll_schedule_timeout+0x3e/0x61[<ffffffff810ed32f>] do_select+0x5ea/0x629[<ffffffff810ed4f7>] core_sys_select+0x189/0x245[<ffffffff810ed632>] SyS_select+0x7f/0xb5[<ffffffff81bd4062>] system_call_fastpath+0x16/0x1b[<ffffffffffffffff>] 0xffffffffffffffff
Some information about the server:
$ df -lhFilesystem Size Used Avail Use% Mounted on/dev/mapper/root 437G 13G 403G 3% //dev/sda1 2.0G 204M 1.6G 12% /boottmpfs 7.9G 224K 7.9G 1% /rundevtmpfs 7.8G 0 7.8G 0% /devshm 7.9G 0 7.9G 0% /dev/shm$ free -mtotal used free shared buffers cachedMem: 16030 13221 2808 0 10 3486-/+ buffers/cache: 9725 6304Swap: 0 0 0$ cat /proc/cpuinfo...processor : 3vendor_id : GenuineIntelcpu family : 6model : 58model name : Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHzstepping : 9microcode : 0x15cpu MHz : 3092.951cache size : 8192 KB...# postgresql.conf #external_pid_file = '/tmp/postgres.pid'listen_addresses = ''port = 5432max_connections = 50unix_socket_directories = '/tmp'unix_socket_permissions = 0777bonjour = offauthentication_timeout = 10minssl = onssl_ciphers = 'DEFAULT:!LOW:!EXP:!MD5:@STRENGTH'ssl_renegotiation_limit = 512MBssl_cert_file = 'server.crt'ssl_key_file = 'server.key'shared_buffers = 256MBtemp_buffers = 128MBwork_mem = 256MBmaintenance_work_mem = 256MBmax_stack_depth = 1MBmax_files_per_process = 1000max_locks_per_transaction = 128effective_cache_size = 512MBwal_level = minimalwal_writer_delay = 500mscheckpoint_segments = 100enable_indexscan = onenable_sort = onlog_destination = 'syslog'syslog_facility = 'LOCAL0'syslog_ident = 'postgres'client_min_messages = errorlog_min_messages = errorlog_min_error_statement = fatallog_min_duration_statement = -1log_timezone = 'Brazil/East'track_activities = ontrack_counts = ontrack_io_timing = offautovacuum = onautovacuum_max_workers = 1autovacuum_naptime = 300minautovacuum_vacuum_threshold = 50autovacuum_analyze_threshold = 50search_path = 'public'datestyle = 'iso, mdy'intervalstyle = 'postgres'timezone = 'Brazil/East'lc_messages = 'C'lc_monetary = 'C'lc_numeric = 'C'lc_time = 'C'default_text_search_config = 'pg_catalog.english'restart_after_crash = on# end conf #
I really don't know what is happening, why postgres hang and didn't close the connections. This django web interface is used only for management and viewing logs, most server have two users only.
I've sent this email just to know if one of yours have seen this, or something like this, before.
Thank you!
--
-----
(51) 9280-3627