On 07/04/2018 04:08 AM, Boblitz John wrote:
Good Morning,
Beginning yesterday morning, users have been unable to fully connect
to our DB Instance.
1.At the time of the initial report – I was connected to the DB via
pgAdmin and could perform queries without problem.
2.Users reported messages similar to “could not open file
"global/11801": No such file or directory”
3.At that time, connection logging was turned off and there were no
messages in the log files.
4.As this is a development environment, I turned logging on in the
config and restarted the DB
5.After restart, neither I, nor the Users could fully reconnect.
6.I have performed a SYS Level backup (tar of the whole postgres
directory tree)
7.I cannot perform a DB level Backup (same errors occur)
System Details
Linux Debian 7.11
Postgres 9.1 (9.1.24lts-0+d)
It appears that we can connect to the DB Server itself as I get
“connection received” and “connection authorized” – but when trying
to access the DB itself, several errors are raised (see below).
I am assuming that some internals are no longer consistent – the
file “global/11801” for instance really does not exist on the system.
Questions:
1.Is there any way to recover from this (backup is unfortunately
rather old) >
2.What are possible causes? I’d like to prevent this from happening
on my production servers.
Looks like something/someone deleted files from portions of the $DATA
directory. In particular from ~/global and ~/pg_tblspc. Without those
files you can't really proceed.
Can you recover by creating a new instance and restoring from a dump of
the production server?
As to exact cause the only thing I can think of is to to look at the
system logs at the time of the initial failure and see if there is
anything there that would shed light.
** I am aware that we are on older releases, and yes, we plan to
migrate to more current releases “soon” ™ …
Thanks in advance.
John Boblitz
Exceprt from Log:
2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG:
connection received: host=192.168.250.50 port=28559
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG:
connection authorized: user=dbadmin database=postgres
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR: could
not open file "global/11801": No such file or directory
2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT:
SELECT usecreatedb, usesuper, CASE WHEN usesuper THEN
pg_postmaster_start_time() ELSE NULL END as upsince, CASE WHEN
usesuper THEN pg_conf_load_time() ELSE NULL END as confloadedsince,
CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END as
inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location()
ELSE NULL END as receiveloc, CASE WHEN usesuper THEN
pg_last_xlog_replay_location() ELSE NULL END as replayloc, CASE WHEN
usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as
replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN
pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused
FROM pg_user WHERE usename=current_user
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR: could
not open file "global/11801": No such file or directory
2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT:
SELECT rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname =
current_user;
2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG:
connection received: host=192.168.250.50 port=28561
2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG:
connection authorized: user=dbadmin database=g11Base
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR: could
not open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No
such file or directory
2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT:
SELECT CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1
WHEN (nspname LIKE E'pg\\_%') THEN 0
ELSE 3 END AS nsptyp,
nsp.nspname, nsp.oid,
pg_get_userbyid(nspowner) AS namespaceowner, nspacl,
description, has_schema_privilege(nsp.oid, 'CREATE') as cancreate,
(SELECT array_agg(label) FROM pg_seclabels sl1
WHERE sl1.objoid=nsp.oid) AS labels,
(SELECT array_agg(provider) FROM pg_seclabels sl2
WHERE sl2.objoid=nsp.oid) AS providers
FROM pg_namespace nsp
LEFT OUTER JOIN pg_description des ON
(des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)
WHERE NOT ((nspname = 'pg_catalog' AND EXISTS
(SELECT 1 FROM pg_class WHERE relname = 'pg_class' AND relnamespace
= nsp.oid LIMIT 1)) OR
(nspname = 'pgagent' AND EXISTS (SELECT 1 FROM
pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT
1)) OR
(nspname = 'information_schema' AND EXISTS (SELECT
1 FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid
LIMIT 1)) OR
(nspname LIKE '_%' AND EXISTS (SELECT 1 FROM
pg_proc WHERE proname='slonyversion' AND pronamespace = nsp.oid
LIMIT 1))
) AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname
NOT LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx